Originally published on Spice Works
Kubernetes storage is useful for managing multiple forms of persistent and non-persistent storage in a cluster to cater to both stateful and stateless workloads in a containerized environment. Proper management of Kubernetes storage options allows us to dynamically provision the most suitable storage resources for multiple applications with minimal administration overhead.
There are various storage concepts you can leverage in Kubernetes, as well as typical use cases and pro tips for running Kubernetes on AWS from a storage perspective.
Introduction to Kubernetes Storage Concepts
The storage architecture for Kubernetes is based on volumes as the core abstraction. Volumes can be ephemeral (non-persistent) or persistent, depending on their intended use cases. Kubernetes also allows for the dynamic provisioning of storage resources using volume claims.
Volumes are the basic storage entities in Kubernetes. A process in the container sees a filesystem view, which includes a root filesystem that matches the initial contents of a container image and the volumes mounted inside the container (if defined within the container specifications).
Volumes are mounted at specified paths within the container image, so you must independently specify the mount path of each volume used for each container in the pod. While volumes cannot be mounted within other volumes, a volume can be shared between multiple containers in a pod using sub-paths within the volume mount paths in the pod specifications.
Kubernetes supports multiple storage volumes, including local storage devices, network file systems (NFS), and cloud storage services like AWS Elastic Block Store (EBS) volumes. Developers can also create custom storage plugins to support specific storage systems deployed on Kubernetes clusters as extensions.
While support for cloud storage services is currently available within the core Kubernetes project, the Kubernetes Storage Special-Interest Group (k8s-sig-storage) is slowly shifting toward providing storage support through external Container Storage Interfaces (CSIs). For example, the native Amazon Elastic Block Store volume has been deprecated as of v1.17 in favor of the Amazon EBS CSI.
Unless specified within the container specifications, each container in a pod is created with an ephemeral volume by default in Kubernetes. This means there is a temporary storage directory on the machine that hosts the pod. Ephemeral volumes are removed after the pod ceases to exist.
The data in ephemeral volumes is safe in the case of a container crashing, as this does not remove a pod from a node. However, the pod with the crashed container may subsequently be deleted due to rescheduling or any reason that causes the pod to be evicted to another host, causing the data in the ephemeral volumes to be removed in the process.
Persistent Volumes (PVs) and Persistent Volume Claims (PVCs)
Kubernetes uses persistent volumes, and persistent volume claims to allow pods to use storage in a portable manner while abstracting its implementation away from how storage is consumed.
A persistent volume (PV) is a storage entity within a cluster that is either allocated manually by an administrator or dynamically allocated based on a storage class. PVs define the details of the storage implementation, such as capacity, access modes, storage class, and reclaim policy. As they are cluster resources, PVs are not portable between clusters.
A persistent volume claim is a storage request used by developers to describe an application’s storage requirements, for example, a container’s storage size and access mode. Since the storage request is separate from the storage creation, Kubernetes can enforce access control mechanisms based on the container or pod credentials and the available PVs in the cluster.
A PV’s lifecycle is independent of pods. The lifecycle of a PV and PVC consists of 5 stages:
- Provisioning: The cluster administrator creates a PV in static or dynamic modes based on a storage class.
- Binding: When the PVC is made, a PV that fulfills the requirements is assigned and bound to the PVC.
- Using: The container uses storage resources from a PV via a PVC.
- Releasing: The container releases the PV when the PVC is removed.
- Reclaiming: Kubernetes reclaims the storage resources used in the PVC based on the reclaim policy of the PV, unless the storage is set to be deleted when released.
For Kubernetes on AWS, EBS volumes are persistent volumes hosted in the same region and availability zone as the EC2 instance nodes running on the EKS cluster. When a pod is removed from an EC2 instance node on the EKS cluster, the data in the EBS volume mounted to the pod is persisted and the volume is unmounted.
Cluster administrators can configure storage classes in the cluster and assign PVs to each class. Each class represents a particular type of storage that users can request in their PVCs, depending on varying workload requirements within the cluster.
Dynamic Volume Provisioning with Storage Classes
Dynamic volume provisioning is a feature in Kubernetes that lets you create storage volumes on-demand without requiring cluster administrators to create new ones manually.
Each storage class specifies a volume plugin called a provisioner and the parameters required for it to allocate storage volumes dynamically. When a user configures a storage class in their PVC, the provisioner automatically creates a storage volume based on the required specifications.
While some provisioners are internal and shipped alongside the Kubernetes project, you can also use external provisioners by following the specifications defined by Kubernetes.
Container Storage Interfaces (CSIs)
A Container Storage Interface (CSI) is a Kubernetes extension that provides an extensible plugin architecture for vendors to create compatible storage plugins.
There are custom storage plugins in the form of CSI drivers for arbitrary storage systems external to your Kubernetes project (e.g., Amazon Elastic File System).
After you deploy the CSI driver on the Kubernetes cluster, you can use these CSI volumes with other Kubernetes storage API objects such as PVs and storage classes. For example, you can create a storage class that references the CSI provisioner, assign them to PVs, and reference the PVs in PVCs to mount the CSI volumes to a pod.
With the core Kubernetes project’s gradual shift toward distributing control over the provisioning of non-native storage to the respective providers, external vendor-based CSI drivers are currently the preferred approach for managing the lifecycle of the external storage system in Kubernetes clusters. As part of this gradual shift, numerous migration features are slowly being released within the core project to ease the transition from the in-tree storage plugin to the corresponding vendor-based CSI driver.
Use Cases for Kubernetes Storage
With various options available for Kubernetes storage, you need to know which type to choose for different use cases.
Persistent vs. Ephemeral Storage
For transient applications that require data storage (e.g., applications that extract read-only input data in files), you can use ephemeral volumes for storing data within the lifetime of the pod without being limited to the location and availability of some persistent volume.
Persistent volumes will store data beyond the lifetime of the pod for applications requiring data persistence across restarts (such as a database).
Varying Workload Requirements
For use cases that require managing storage for various workloads within the cluster, you can opt for storage classes with various storage levels, backup policies, or any arbitrary policies defined by cluster administrators.
5 Pro Tips for Running Kubernetes on AWS from a Storage Perspective
When it comes to running Kubernetes on AWS from a storage perspective, there are five pro tips you should keep in mind.
1. Include PVCs in the Container Config
Using PVCs in the container config as part of the deployment IaC template lets users request persistent storage across clusters. This, in turn, enables storage configuration portability that is not tightly coupled with cluster resources. This tip applies to ephemeral Amazon EKS clusters created on-demand and Amazon EKS clusters deployed across multiple regions and availability zones.
2. Do Not Include PVs in the Container Config
Including PVs in the container, config is not recommended if you want to avoid tight coupling with a specific volume and prevent failure in binding storage volume when instantiating the container in the pod.
Instead, use PVCs as volumes while the cluster provisions storage by finding the PV bound to the claim and mounting that volume to the pod.
3. Define a Default Storage Class for PVCs
Cluster administrators can specify a default storage class for PVCs that do not have specific requirements for the storage class they need to bind to; they can also create separate storage classes that represent varying workload requirements. This allows users to request storage in PVCs based on the name of the storage class without manually specifying the volume within each pod’s specifications. Otherwise, PVCs without a specified storage class will fail to provide a PV.
4. Let Users Provide a Storage Class
Cluster administrators can provide users with a storage class when instantiating the config template. If the user provides a storage class name, the value should be defined in the StorageClassName within the PVC specifications so that the PVC can match the correct storage class. Otherwise, a PV can be automatically provisioned for the user using the default storage class in the cluster.
5. Look Out for Unbound PVCs
When monitoring your Kubernetes cluster with Prometheus and configuring alerting rules via Alertmanager, keep an eye out for PVCs that remain unbound for a prolonged period. This could mean your cluster lacks dynamic storage support or a storage system, where users won’t be able to deploy configs requiring PVCs. If this is the case, the user should create a PV that matches the requirements defined in the PVC.
Implementing best practices for Kubernetes storage enables you to apply optimal storage configurations and dynamically provision suitable storage resources to multiple containerized applications without significant administrative overhead.
To reap the benefits of Kubernetes storage options on AWS, you can gradually adopt these best practices by creating storage classes and enabling dynamic volume provisioning in the cluster. You can choose suitable storage types for your containerized workloads based on the given use case.