Scaling .NET 8 Applications in Kubernetes

This article is the second in the series "Use Kubernetes to host .NET 8 applications":

Containerize and Host .NET 8 applications in Kubernetes
Scaling .NET 8 Applications in Kubernetes (this article)
Hosting .NET 8 Applications in Azure Kubernetes Service (AKS)

The previous article focused on containerizing and hosting a simple .NET 8 console application in Kubernetes. We started by creating a test application and crafting a Dockerfile to build a Docker image of the application. We then explored the basics of Kubernetes, setting up a local Kubernetes cluster using Docker Desktop, and deploying our application container within a Kubernetes pod. Finally, we addressed state persistence challenges by configuring a hostPath volume, ensuring our application's progress is saved across pod restarts.

In this article, I will reveal a few vital components of Kubernetes that help deliver and scale applications.

Test application

The test application from the previous article emulates some work and saves the process in a text file. You can follow the process in two ways: observing application logs using kubectl logs or reading the configuration file.

This requires access to Kubernetes and its components. In typical scenarios, users don't have this access. Imagine if, when uploading a video to YouTube, you had to log into the servers to check the status. First, Google wouldn't allow it. Second, it's a poor user experience.

To eliminate this problem we will create an API with an endpoint that shows the current operation state (the source code). Here's a snippet of the code.

var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

const string configFileName = "/configuration/step.config";

app.MapGet("/api/state", () =>
{
    if(File.Exists(configFileName))
    {
        return Results.Ok(File.ReadAllText(configFileName));
    }

    return Results.BadRequest();
});

app.MapGet("/api/health", () =>
{
    return Results.Ok(Environment.MachineName);
});

app.Run();

The API has two GET endpoints:

/api/state shows the current operation state, performed by the background process
/api/health indicates if the API works as expected and returns the machine name.

The Dockerfile and pod configuration files are similar to the ones used for the background process, so they are not included in this article.

Deployment

The way a pod was created in the previous article has a few drawbacks. The main ones are difficulty in scaling and recreating the pod when it dies.

A Deployment in Kubernetes is like a blueprint for your application. It defines the desired state for your pods and ensures your cluster matches this state. Whether you need to roll out a new version of your app, scale up to handle increased traffic or roll back to a previous version in case of issues, Deployments have you covered.

Deployments make updating your application seamless. You update the Deployment configuration, and Kubernetes takes care of the rest—creating new pods, managing old ones, and ensuring zero downtime. This powerful feature helps you maintain high availability and resilience for your .NET 8 applications.

Next, we will create the deployment for our API service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-kube-api-deploy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello-kube-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      labels:
        app: hello-kube-api
    spec:
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: hello-pvc
      containers:
      - name: hello-kube-api
        image: hello-kube-api:1.0
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: data
          mountPath: /configuration

The deployment file looks bulky, but if you look closely, you’ll notice that some parts are similar to pod configuration and only half is new. It's better to start with the new parts. All configuration elements are under the spec element, so I will omit it for clarity.

replicas: 2 ensures that two instances (pods) of our application are running at all times, providing redundancy and load balancing.
selector with matchLabels tells Kubernetes which pods are managed by this Deployment. Here, it selects pods with the label app: hello-kube-api.

The strategy configuration defines the update strategy: RollingUpdate or Recreate. The RollingUpdate strategy ensures zero downtime by incrementally updating pods and maintaining application availability throughout the process. You can control the update pace with parameters like maxUnavailable and maxSurge, ensuring a seamless user experience. On the other hand, the Recreate strategy shuts down all existing pods before starting new ones, resulting in downtime. This straightforward approach is suitable for non-critical applications or when updates require significant changes incompatible with running pods.

To ensure zero downtime by gradually replacing old pods with new ones, the type: RollingUpdate suits us best.

The rollingUpdate element allows adjusting the strategy more thoroughly.
- maxUnavailable: 1 allows only one pod to be unavailable during the update process.
- maxSurge: 1 specifies that one additional pod can be created during the update, ensuring there’s always an extra pod available during the rollout.

The template describes the pod that will be created with this deployment. It specifies the container and the volume. This configuration is similar to the one used for creating the pod, with the only difference being that we use the API image and instead of mounting hostPath, we use persistentVolumeClaim.

Note that template.metadata sets app label as hello-kube-api which matches the matchLabels selector.

In Kubernetes, PersistentVolume (PV) and PersistentVolumeClaim (PVC) are essential for managing storage. A PersistentVolume is a piece of storage provisioned by an administrator or dynamically created using Storage Classes. It's independent of any individual pod, ensuring your data persists beyond the lifecycle of pods. On the other hand, a PersistentVolumeClaim is a request for storage by a user. When you create a PVC, Kubernetes finds a PV that matches the claim’s requirements, binding them together. This separation allows for dynamic and flexible storage management, ensuring your applications have the persistent storage they need.

The configurations for PV and PVC needed for our applications are below.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: hello-pv
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hello-pvc
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

By defining these PV and PVC we create a persistent storage solution that all our Kubernetes pods can use, ensuring data is retained even if the pods are terminated or rescheduled.

We can apply deployment and storage configuration in any order, but we have to understand that our deployment cannot be created until the storage is provided. Thus I recommend provisioning the storage first and only after that apply deployment configuration.

kubectl apply -f storage.yml
kubectl apply -f api-deploy.yml

As usual, we can use kubectl get to check the status of our deployment.

> kubectl get deploy

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
hello-kube-api-deploy   2/2     2            2           10m

The interesting part is that we asked to create two replicas of our pods instead of one pod as it was with our background process. We can see more details using the describe command. If you use it, you can receive a more detailed response about the deployment, pod, volume, and events.

> kubectl describe deployments hello-kube-api-deploy

Name:                   hello-kube-api-deploy
Namespace:              default
CreationTimestamp:      Sun, 23 Jun 2024 11:44:58 +0200
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=hello-kube-api
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
Pod Template:
  Labels:  app=hello-kube-api
  Containers:
   hello-kube-api:
    Image:        hello-kube-api:1.0
    Port:         8080/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /configuration from data (rw)
  Volumes:
   data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  hello-pvc
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   hello-kube-api-deploy-b6686ccc6 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  34s   deployment-controller  Scaled up replica set hello-kube-api-deploy-b6686ccc6 to 2

Right now we have two applications running in Kubernetes: the background process and the API.

NAME                                    READY   STATUS    RESTARTS      AGE
app-pod                                 1/1     Running   2 (62m ago)   33d
hello-kube-api-deploy-b6686ccc6-kxc9j   1/1     Running   0             59m
hello-kube-api-deploy-b6686ccc6-sc7lj   1/1     Running   0             59m

Because we created the replica set with two pods in it K8S needs to assign unique names for each pod. For that, it uses the following format:

[the name of deployment]-[unique identifier of deployment]-[unique identifier of pod]

Load Balancing and Service Configuration

Our application works in Kubernetes, but we still haven't seen the results. Creating a frontend with a UI is outside the scope of this article, so we will try to call our API to see the result.

Kubernetes assigns a private cluster IP for each pod in a cluster so that pods can communicate. This private IP is not accessible from the outside world. Each pod has a unique name and IP, which we can use to communicate with them. However, relying on pod names and IPs is impractical because pods can be deleted and recreated with different names and IP addresses at any time. To solve this issue, Kubernetes uses an abstraction called Services.

Services are essential for enabling communication between different components of an application. A Service defines a logical set of pods and a policy to access them. Kubernetes offers different types of services:

ClusterIP for internal communication within the cluster.
NodePort for exposing services on each node's IP at a static port.
LoadBalancer for exposing services externally using a cloud provider's load balancer.

We want to access our pods from the external world, so we will use a LoadBalancer service.

apiVersion: v1
kind: Service
metadata:
  name: hello-kube-lb-svc
  labels:
    app: hello-kube-api
spec:
  type: LoadBalancer
  ports:
  - port: 8080
    protocol: TCP
  selector:
    app: hello-kube-api

This configuration creates a LoadBalancer service named hello-kube-lb-svc that listens on port 8080 and directs traffic to our API pods labeled app: hello-kube-api.

Apply the configuration with:

kubectl apply -f api-lb.yml

Check the status of the service:

kubectl get svc

And use describe to see more details

kubectl describe svc hello-kube-lb-svc

Testing the applications

We are finally ready to test our applications. Start by checking the health endpoint by navigating to http://localhost:8080/api/health in your browser.

You should see a 200 OK response with the text hello-kube-api-deploy-[unique identifier], which matches the pod name. Refresh the page a few times, and you will notice different responses as the load balancer directs traffic to different pods.

If you delete a pod:

kubectl delete pod [pod-name]

Refresh the page again, you should still get a 200 OK response, but from a different pod, indicating that Kubernetes has recreated the deleted pod and the service is working as expected.

Let's scale our application to have five replicas of the API. Modify the spec.replicas parameter in the deployment configuration and apply the changes:

kubectl apply -f api-deploy.yml

Verify the number of running pods:

kubectl get pods

You should see five pods running:

NAME                                    READY   STATUS    RESTARTS       AGE
app-pod                                 1/1     Running   0              33d
hello-kube-api-deploy-b6686ccc6-8kkql   1/1     Running   0              19s
hello-kube-api-deploy-b6686ccc6-8vmzs   1/1     Running   0              9m16s
hello-kube-api-deploy-b6686ccc6-hh9hd   1/1     Running   0              19s
hello-kube-api-deploy-b6686ccc6-kxc9j   1/1     Running   0              126m
hello-kube-api-deploy-b6686ccc6-prbmc   1/1     Running   0              19s

Finally, let's inspect the state endpoint to check the status of the background operation. This endpoint shows the result of the work done by two different pods, with the result persisting in the volume.

Navigate to http://localhost:8080/api/state in your browser. You should see a response like 1293. Recheck it after a few minutes, and the number should increment, indicating the background process is working correctly.

Summary

In this article, we build on our previous work containerizing a .NET 8 application by focusing on scaling and managing the application within Kubernetes. We introduce a new API service for monitoring the application's state and health, and we create a Kubernetes Deployment to ensure the application runs smoothly with zero downtime. Persistent storage is managed using PersistentVolumes (PV) and PersistentVolumeClaims (PVC), ensuring data persistence across pod restarts. Finally, we configure a LoadBalancer service to expose the API to external traffic, enabling us to test and verify the application's scalability and resilience.