If you have an application that needs to read and write the same files simultaneously from multiple Kubernetes pods, you need shared file storage. With Ceph and Rook, you create a redundant filesystem in Kubernetes based on CephFS.
In this guide, you will install Rook, create a basic Ceph cluster, and then configure CephFS as shared ReadWriteMany storage for two pods. The examples assume a cluster with three PVCs and transip-fast-storage.
- For a redundant CephFS volume, you need at least three nodes or three PVCs.
- The CephCluster configuration below uses “transip-fast-storage”. If you use a different StorageClass (e.g. transip-big-storage), replace it in all places in the manifest.
Install Smoke
Step 1
Download a Rook release (see Github for the latest version) and go to the sample files:
git clone --single-branch --branch v1.19.6 https://github.com/rook/rook.git
cd rook/deploy/examplesAccording to the official Rook quickstart, you should preferably use files from a tagged release instead of the "master" branch, so that your manifests align with a specific Rook version.
Step 2
Install the Rook operator and the necessary CRDs (Custom Resource Definitions):
kubectl create -f crds.yaml -f common.yaml -f csi-operator.yaml -f operator.yamlThis creates, among other things, the namespace rook-ceph, the Ceph CRDs, and the operator that manages the storage cluster.
Step 3
Check if the operator is active:
kubectl -n rook-ceph get podsDo not proceed until the rook-ceph operator is in “Running” status. Without an operator, Kubernetes cannot process a CephCluster object.
Step 4
Create a PVC-based Ceph cluster. This fits well with a cloud environment with dynamic volumes, such as a cluster with a PVC based on fast storage:
nano cluster-on-pvc.yamlPlace the following configuration in the file:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v19.2.3
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
volumeClaimTemplate:
spec:
storageClassName: transip-fast-storage
resources:
requests:
storage: 10Gi
storage:
storageClassDeviceSets:
- name: set1
count: 3
portable: false
encrypted: false
volumeClaimTemplates:
- metadata:
name: data
spec:
resources:
requests:
storage: 10Gi
storageClassName: transip-fast-storage
volumeMode: Block
accessModes:
- ReadWriteOnce
onlyApplyOSDPlacement: falseSave the changes and close the file (ctrl + x > y > enter).
Do you prefer using big storage? Then replace transip-fast-storage with transip-big-storage.
Step 5
Create the Ceph cluster:
kubectl apply -f cluster-on-pvc.yamlThis is a PVC-based cluster as Rook recommends for dynamic cloud environments. The monitors use PVCs in filesystem mode and the OSDs receive block-mode volumes via the same StorageClass.
Step 6
Wait until the monitor, manager, and OSD pods are active:
kubectl -n rook-ceph get pods
kubectl -n rook-ceph get cephcluster rook-cephProceed once the monitors are in quorum, a manager is active, and you see at least three OSDs that are “up” and “in”. If the status does not become healthy, first check your StorageClass and node capacity.
Creating the CephFilesystem
Step 1
Create a manifest file for the Ceph filesystem:
nano filesystem.yamlPlace the following configuration in the file:
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: sharedfs
namespace: rook-ceph
spec:
metadataPool:
replicated:
size: 3
dataPools:
- name: replicated
replicated:
size: 3
preserveFilesystemOnDelete: true
metadataServer:
activeCount: 1
activeStandby: trueSave the changes and close the file (ctrl + x > y > enter).
Step 2
Now actually create the filesystem:
kubectl apply -f filesystem.yamlThis creates a metadata pool and a data pool with replication across three OSDs. `activeStandby: true` also keeps a second MDS pod on standby for failover.
Step 3
Check if the MDS pods have started:
kubectl -n rook-ceph get pods -l app=rook-ceph-mdsYou expect to see two pods here for `sharedfs`: one active MDS and one standby. Only then will your filesystem have a usable shared file service.
Create a StorageClass for CephFS
Step 1
Create a StorageClass that uses the new filesystem:
nano storageclass.yamlPlace the following configuration in the file:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs-sharedfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
clusterID: rook-ceph
fsName: sharedfs
pool: sharedfs-replicated
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-publish-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/controller-publish-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: ImmediateSave the changes and close the file (ctrl + x > y > enter).
Step 2
Create the StorageClass:
kubectl apply -f storageclass.yamlThe field “pool: sharedfs-replicated” refers to the data pool of the CephFilesystem. If you use a different name for the filesystem, adjust this field accordingly.
Create a shared persistent volume claim
You are free to change the filenames, metadata name, and namespace to a name of your choice. Please note that you must then change these consistently in all subsequent steps.
Step 1
Create a manifest with a namespace, a persistent volume claim and two pods that mount the same volume:
nano cephfs-test.yamlPlace the following configuration in the file:
apiVersion: v1
kind: Namespace
metadata:
name: cephfs-test
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-data
namespace: cephfs-test
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: rook-cephfs-sharedfs
---
apiVersion: v1
kind: Pod
metadata:
name: writer
namespace: cephfs-test
spec:
restartPolicy: Never
containers:
- name: busybox
image: busybox:1.36
command: ["/bin/sh", "-c"]
args:
- |
echo "writer-start $(date -u +%FT%TZ)" > /data/hello.txt
sync
sleep 3600
volumeMounts:
- name: shared
mountPath: /data
volumes:
- name: shared
persistentVolumeClaim:
claimName: shared-data
---
apiVersion: v1
kind: Pod
metadata:
name: reader
namespace: cephfs-test
spec:
restartPolicy: Never
containers:
- name: busybox
image: busybox:1.36
command: ["/bin/sh", "-c"]
args:
- "sleep 3600"
volumeMounts:
- name: shared
mountPath: /data
volumes:
- name: shared
persistentVolumeClaim:
claimName: shared-dataSave the changes and close the file (ctrl + x > y > enter).
Step 2
Create the namespace, PVC, and pods:
kubectl apply -f cephfs-test.yamlThis configuration uses “ReadWriteMany”, so that multiple pods can use the same volume simultaneously.
Step 3
Check if the PVC is bound and the pods have started:
kubectl -n cephfs-test get pvc,podsThe PVC should be “Bound” and both pods should be in “Running” status.
Check if both pods see the same data
Step 1
Read the file from the writer pod:
kubectl -n cephfs-test exec writer -- cat /data/hello.txt
Step 2
From the reader pod, add an extra line and then show the content again:
kubectl -n cephfs-test exec reader -- sh -c 'echo reader-check $(date -u +%FT%TZ) >> /data/hello.txt && cat /data/hello.txt'If you see both “writer-start” and “reader-check” in the output, then both pods mount the same CephFS volume and shared storage works as intended.
Take into account the limits of shared file storage
- Namespace-scope: A PVC remains bound to one namespace. If you want to share data across namespaces, this requires a different setup than just mounting the same PVC.
- Application behavior: Shared storage is only secure if your application can handle concurrent write operations and file locking.
- Scheduling: If you want to explicitly test across different nodes, position pods specifically using labels, selectors, or anti-affinity.
- Backup strategy: Replication protects against disk or node failure, but not against logical errors or unwanted changes to files. Therefore, create backups or snapshots in addition.
By using CephFS via Rook, you combine shared file storage with storage-level redundancy. This allows multiple pods to use the same data simultaneously, while the underlying data remains replicated across multiple OSDs.