Cart

    Sorry, we could not find any results for your search querry.

    Redundant shared storage in Kubernetes with Rook CephFS

    If you have an application that needs to read and write the same files simultaneously from multiple Kubernetes pods, you need shared file storage. With Ceph and Rook, you create a redundant filesystem in Kubernetes based on CephFS.

    In this guide, you will install Rook, create a basic Ceph cluster, and then configure CephFS as shared ReadWriteMany storage for two pods. The examples assume a cluster with three PVCs and transip-fast-storage.

    • For a redundant CephFS volume, you need at least three nodes or three PVCs.
       
    • The CephCluster configuration below uses “transip-fast-storage”. If you use a different StorageClass (e.g. transip-big-storage), replace it in all places in the manifest.
     

     

    Install Smoke

     

    Step 1

    Download a Rook release (see Github for the latest version) and go to the sample files:

    git clone --single-branch --branch v1.19.6 https://github.com/rook/rook.git
    cd rook/deploy/examples

    According to the official Rook quickstart, you should preferably use files from a tagged release instead of the "master" branch, so that your manifests align with a specific Rook version.


     

    Step 2

    Install the Rook operator and the necessary CRDs (Custom Resource Definitions):

    kubectl create -f crds.yaml -f common.yaml -f csi-operator.yaml -f operator.yaml

    This creates, among other things, the namespace rook-ceph, the Ceph CRDs, and the operator that manages the storage cluster.


     

    Step 3

    Check if the operator is active:

    kubectl -n rook-ceph get pods

    Do not proceed until the rook-ceph operator is in “Running” status. Without an operator, Kubernetes cannot process a CephCluster object.


     

    Step 4

    Create a PVC-based Ceph cluster. This fits well with a cloud environment with dynamic volumes, such as a cluster with a PVC based on fast storage:

    nano cluster-on-pvc.yaml

    Place the following configuration in the file:

    apiVersion: ceph.rook.io/v1
    kind: CephCluster
    metadata:
      name: rook-ceph
      namespace: rook-ceph
    spec:
      cephVersion:
        image: quay.io/ceph/ceph:v19.2.3
      dataDirHostPath: /var/lib/rook
      mon:
        count: 3
        allowMultiplePerNode: false
        volumeClaimTemplate:
          spec:
            storageClassName: transip-fast-storage
            resources:
              requests:
                storage: 10Gi
      storage:
        storageClassDeviceSets:
        - name: set1
          count: 3
          portable: false
          encrypted: false
          volumeClaimTemplates:
          - metadata:
              name: data
            spec:
              resources:
                requests:
                  storage: 10Gi
              storageClassName: transip-fast-storage
              volumeMode: Block
              accessModes:
                - ReadWriteOnce
        onlyApplyOSDPlacement: false

    Save the changes and close the file (ctrl + x > y > enter). 

    Do you prefer using big storage? Then replace transip-fast-storage with transip-big-storage.


     

    Step 5

    Create the Ceph cluster:

    kubectl apply -f cluster-on-pvc.yaml

    This is a PVC-based cluster as Rook recommends for dynamic cloud environments. The monitors use PVCs in filesystem mode and the OSDs receive block-mode volumes via the same StorageClass.


     

    Step 6

    Wait until the monitor, manager, and OSD pods are active:

    kubectl -n rook-ceph get pods
    kubectl -n rook-ceph get cephcluster rook-ceph

    Proceed once the monitors are in quorum, a manager is active, and you see at least three OSDs that are “up” and “in”. If the status does not become healthy, first check your StorageClass and node capacity.


     

    Creating the CephFilesystem

     

    Step 1

    Create a manifest file for the Ceph filesystem:

    nano filesystem.yaml

    Place the following configuration in the file:

    apiVersion: ceph.rook.io/v1
    kind: CephFilesystem
    metadata:
      name: sharedfs
      namespace: rook-ceph
    spec:
      metadataPool:
        replicated:
          size: 3
      dataPools:
        - name: replicated
          replicated:
            size: 3
      preserveFilesystemOnDelete: true
      metadataServer:
        activeCount: 1
        activeStandby: true

    Save the changes and close the file (ctrl + x > y > enter).


     

    Step 2

    Now actually create the filesystem:

    kubectl apply -f filesystem.yaml

    This creates a metadata pool and a data pool with replication across three OSDs. `activeStandby: true` also keeps a second MDS pod on standby for failover.


     

    Step 3

    Check if the MDS pods have started:

    kubectl -n rook-ceph get pods -l app=rook-ceph-mds

    You expect to see two pods here for `sharedfs`: one active MDS and one standby. Only then will your filesystem have a usable shared file service.


     

    Create a StorageClass for CephFS

     

    Step 1

    Create a StorageClass that uses the new filesystem:

    nano storageclass.yaml

    Place the following configuration in the file:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: rook-cephfs-sharedfs
    provisioner: rook-ceph.cephfs.csi.ceph.com
    parameters:
      clusterID: rook-ceph
      fsName: sharedfs
      pool: sharedfs-replicated
      csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
      csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
      csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
      csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
      csi.storage.k8s.io/controller-publish-secret-name: rook-csi-cephfs-provisioner
      csi.storage.k8s.io/controller-publish-secret-namespace: rook-ceph
      csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
      csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    reclaimPolicy: Delete
    allowVolumeExpansion: true
    volumeBindingMode: Immediate

    Save the changes and close the file (ctrl + x > y > enter).


     

    Step 2

    Create the StorageClass:

    kubectl apply -f storageclass.yaml

    The field “pool: sharedfs-replicated” refers to the data pool of the CephFilesystem. If you use a different name for the filesystem, adjust this field accordingly.


     

    Create a shared persistent volume claim

     

    You are free to change the filenames, metadata name, and namespace to a name of your choice. Please note that you must then change these consistently in all subsequent steps.

     

     

    Step 1

    Create a manifest with a namespace, a persistent volume claim and two pods that mount the same volume:

    nano cephfs-test.yaml

    Place the following configuration in the file:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: cephfs-test
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: shared-data
      namespace: cephfs-test
    spec:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 1Gi
      storageClassName: rook-cephfs-sharedfs
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: writer
      namespace: cephfs-test
    spec:
      restartPolicy: Never
      containers:
        - name: busybox
          image: busybox:1.36
          command: ["/bin/sh", "-c"]
          args:
            - |
              echo "writer-start $(date -u +%FT%TZ)" > /data/hello.txt
              sync
              sleep 3600
          volumeMounts:
            - name: shared
              mountPath: /data
      volumes:
        - name: shared
          persistentVolumeClaim:
            claimName: shared-data
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: reader
      namespace: cephfs-test
    spec:
      restartPolicy: Never
      containers:
        - name: busybox
          image: busybox:1.36
          command: ["/bin/sh", "-c"]
          args:
            - "sleep 3600"
          volumeMounts:
            - name: shared
              mountPath: /data
      volumes:
        - name: shared
          persistentVolumeClaim:
            claimName: shared-data

    Save the changes and close the file (ctrl + x > y > enter).


     

    Step 2

    Create the namespace, PVC, and pods:

    kubectl apply -f cephfs-test.yaml

    This configuration uses “ReadWriteMany”, so that multiple pods can use the same volume simultaneously.


     

    Step 3

    Check if the PVC is bound and the pods have started:

    kubectl -n cephfs-test get pvc,pods

    The PVC should be “Bound” and both pods should be in “Running” status.


     

    Check if both pods see the same data

     

    Step 1

    Read the file from the writer pod:

    kubectl -n cephfs-test exec writer -- cat /data/hello.txt

     

    Step 2

    From the reader pod, add an extra line and then show the content again:

    kubectl -n cephfs-test exec reader -- sh -c 'echo reader-check $(date -u +%FT%TZ) >> /data/hello.txt && cat /data/hello.txt'

    If you see both “writer-start” and “reader-check” in the output, then both pods mount the same CephFS volume and shared storage works as intended.


     

    Take into account the limits of shared file storage

     

    • Namespace-scope: A PVC remains bound to one namespace. If you want to share data across namespaces, this requires a different setup than just mounting the same PVC.
    • Application behavior: Shared storage is only secure if your application can handle concurrent write operations and file locking.
    • Scheduling: If you want to explicitly test across different nodes, position pods specifically using labels, selectors, or anti-affinity.
    • Backup strategy: Replication protects against disk or node failure, but not against logical errors or unwanted changes to files. Therefore, create backups or snapshots in addition.

     

    By using CephFS via Rook, you combine shared file storage with storage-level redundancy. This allows multiple pods to use the same data simultaneously, while the underlying data remains replicated across multiple OSDs.

    Need help?

    Receive personal support from our supporters

    Contact us