Configure Ceph Shared File System (CephFS)#
The Ceph Shared File System, or CephFS, provides the ability to create
read/write shared file system Persistent Volumes (PVs). These PVs support the
ReadWriteMany
access mode for the FileSystem
volume mode.
CephFS deploys its own daemons called MetaData Servers or Ceph MDS. For
details, see Ceph Documentation: Ceph File System.
Note
By design, CephFS data pool and metadata pool must be replicated
only.
CephFS specification parameters #
The CephDeployment
custom resource (CR) spec
includes the sharedFilesystem.cephFS
section
with the following CephFS parameters:
name
- CephFS instance name.-
dataPools
- A list of CephFS data pool specifications. Each spec contains thename
,replicated
orerasureCoded
,deviceClass
, andfailureDomain
parameters. The first pool in the list is treated as the default data pool for CephFS and must always bereplicated
. ThefailureDomain
parameter may be set tohost
,rack
and so on, defining the failure domain across which the data will be spread. The number of data pools is unlimited, but the default pool must always be present. For example:spec: sharedFilesystem: cephFS: - name: cephfs-store dataPools: - name: default-pool deviceClass: ssd replicated: size: 3 failureDomain: host - name: second-pool deviceClass: hdd failureDomain: rack erasureCoded: dataChunks: 2 codingChunks: 1
where
replicated.size
is the number of full copies of data on multiple nodes.Warning
When using the non-recommended Ceph pools
replicated.size
of less than3
, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specifiedreplicated.size
.For example, if
replicated.size
is2
, the minimal replica size is1
, and ifreplicated.size
is3
, then the minimal replica size is2
. The replica size of1
allows Ceph having PGs with only one Ceph OSD in theacting
state, which may cause aPG_TOO_DEGRADED
health warning that blocks Ceph OSD removal. We recommend settingreplicated.size
to3
for each Ceph pool.Warning
Modifying of
dataPools
on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes indataPools
, we recommend re-creating CephFS. -
metadataPool
- CephFS metadata pool spec that should only containreplicated
,deviceClass
, andfailureDomain
parameters. ThefailureDomain
parameter may be set tohost
,rack
and so on, defining the failure domain across which the data will be spread. Can use onlyreplicated
settings. For example:spec: sharedFilesystem: cephFS: - name: cephfs-store metadataPool: deviceClass: nvme replicated: size: 3 failureDomain: host
where
replicated.size
is the number of full copies of data on multiple nodes.Warning
Modifying of
metadataPool
on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes inmetadataPool
, we recommend re-creating CephFS. -
preserveFilesystemOnDelete
- Defines whether to delete the data and metadata pools if CephFS is deleted. Set totrue
to avoid occasional data loss in case of human error. However, for security reasons, we recommend settingpreserveFilesystemOnDelete
tofalse
. -
metadataServer
- Metadata Server settings correspond to the Ceph MDS daemon settings. Contains the following fields:activeCount
- the number of active Ceph MDS instances. As a load increases, CephFS will automatically partition the file system across the Ceph MDS instances. Rook will create double the number of Ceph MDS instances as requested byactiveCount
. The extra instances will be in the standby mode for failover. We recommend specifying this parameter to1
and increasing the MDS daemons count only in case of a high load.activeStandby
- defines whether the extra Ceph MDS instances will be in active standby mode and will keep a warm cache of the file system metadata for faster failover. CephFS will assign the instances in failover pairs. Iffalse
, the extra Ceph MDS instances will all be in passive standby mode and will not maintain a warm cache of the metadata. The default value isfalse
.resources
- represents Kubernetes resource requirements for Ceph MDS pods.
For example:
spec: sharedFilesystem: cephFS: - name: cephfs-store metadataServer: activeCount: 1 activeStandby: false resources: # example, non-prod values requests: memory: 1Gi cpu: 1 limits: memory: 2Gi cpu: 2
Configure CephFS#
-
Optional. Override the CSI CephFS gRPC and liveness metrics port. For example, if an application is already using the default CephFS ports
9092
and9082
, which may cause conflicts on the node. Upgrade Pelagia Helm release values with desired port numbers:helm upgrade --install pelagia-ceph oci://registry.mirantis.com/pelagia/pelagia-ceph --version 1.0.0 -n pelagia \ --set rookConfig.csiCephFsGPCMetricsPort=<desiredPort>,rookConfig.csiCephFsLivenessMetricsPort=<desiredPort>
Rook will enable the CephFS CSI plugin and provisioner.
-
Open the
CephDeployment
CR for editing:kubectl -n pelagia edit cephdpl
-
In the
sharedFilesystem
section, specify parameters according to CephFS specification. For example:spec: sharedFilesystem: cephFS: - name: cephfs-store dataPools: - name: cephfs-pool-1 deviceClass: hdd replicated: size: 3 failureDomain: host metadataPool: deviceClass: nvme replicated: size: 3 failureDomain: host metadataServer: activeCount: 1 activeStandby: false
-
Define the
mds
role for the corresponding nodes where Ceph MDS daemons should be deployed. We recommend labeling only one node with themds
role. For example:spec: nodes: ... worker-1: roles: ... - mds
Once CephFS is specified in the CephDeployment
CR, Pelagia Deployment Controller will validate it and
request Rook to create CephFS. Then Pelagia Deployment Controller will create a Kubernetes StorageClass
,
required to start provisioning the storage, which will operate the CephFS CSI driver to create Kubernetes PVs.