Skip to content

CephDeployment Custom Resource#

This section describes how to configure a Ceph cluster using the CephDeployment (cephdeployments.lcm.mirantis.com) custom resource (CR).

The CephDeployment CR spec specifies the nodes to deploy as Ceph components. Based on the roles definitions in the CephDeployment CR, Pelagia Deployment Controller automatically labels nodes for Ceph Monitors and Managers. Ceph OSDs are deployed based on the devices parameter defined for each Ceph node.

For the default CephDeployment CR, see the following example:

Example configuration of Ceph specification
apiVersion: lcm.mirantis.com/v1alpha1
kind: CephDeployment
metadata:
  name: pelagia-ceph
  namespace: pelagia
spec:
  nodes:
  - name: cluster-storage-controlplane-0
    roles:
    - mgr
    - mon
    - mds
  - name: cluster-storage-controlplane-1
    roles:
    - mgr
    - mon
    - mds
  - name: cluster-storage-controlplane-2
    roles:
    - mgr
    - mon
    - mds
  - name: cluster-storage-worker-0
    roles: []
    devices:
    - config:
        deviceClass: ssd
      fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231434939
  - name: cluster-storage-worker-1
    roles: []
    devices:
    - config:
        deviceClass: ssd
      fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231440912
  - name: cluster-storage-worker-2
    roles: []
    devices:
    - config:
        deviceClass: ssd
      fullPath: /dev/disk/by-id/scsi-1ATA_WDC_WDS100T2B0A-00SM50_200231443409
  pools:
  - default: true
    deviceClass: ssd
    name: kubernetes
    replicated:
      size: 3
  objectStorage:
    rgw:
      name: rgw-store
      dataPool:
        deviceClass: ssd
        replicated:
          size: 3
      metadataPool:
        deviceClass: ssd
        replicated:
          size: 3
      gateway:
        allNodes: false
        instances: 3
        port: 8081
        securePort: 8443
      preservePoolsOnDelete: false
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      dataPools:
      - name: cephfs-pool-1
        deviceClass: ssd
        replicated:
          size: 3
      metadataPool:
        deviceClass: ssd
        replicated:
          size: 3
      metadataServer:
        activeCount: 1
        activeStandby: false

Configure a Ceph cluster with CephDeployment#

  1. Select from the following options:

    • If you do not have a Ceph cluster yet, create cephdeployment.yaml for editing.
    • If the Ceph cluster is already deployed, open the CephDeployment CR for editing:
    kubectl -n pelagia edit cephdpl
    
  2. Using the tables below, configure the Ceph cluster as required:

  3. Select from the following options:

    • If you are creating Ceph cluster, save the updated CephDeployment template to the corresponding file and apply the file to a cluster:
      kubectl apply -f cephdeployment.yaml
      
    • If you are editing CephDeployment , save the changes and exit the text editor to apply it.
  4. Verify CephDeployment reconcile status with Status fields.

CephDeployment configuration options#

The following subsections contain a description of CephDeployment parameters for an advanced configuration.

General parameters #

Parameter
Description
network Specifies access and public networks for the Ceph cluster. For details, see Network parameters.
nodes Specifies the list of Ceph nodes. For details, see Node parameters. The nodes parameter is a list with Ceph node specifications. List item could define Ceph node specification for a single node or a group of nodes listed or defined by label. It could be also combined.
pools Specifies the list of Ceph pools. For details, see Pool parameters.
clients List of Ceph clients. For details, see Clients parameters.
objectStorage Specifies the parameters for Object Storage, such as RADOS Gateway, the Ceph Object Storage. Also specifies the RADOS Gateway Multisite configuration. For details, see RADOS Gateway parameters and Multisite parameters.
ingressConfig Enables a custom ingress rule for public access on Ceph services, for example, Ceph RADOS Gateway. For details, see Configure Ceph Object Gateway TLS.
sharedFilesystem Enables Ceph Filesystem. For details, see CephFS parameters.
rookConfig String key-value parameter that allows overriding Ceph configuration options. For details, see RookConfig parameters.
healthCheck Configures health checks and liveness probe settings for Ceph daemons. For details, see Health check parameters.
extraOpts Enables specification of extra options for a setup, includes the deviceLabels parameter. Refer to Extra options for details.
mgr Specifies a list of Ceph Manager modules to be enabled or disabled. For details, see Manager modules parameters. Modules balancer and pg_autoscaler are enabled by default.
dashboard Enables Ceph dashboard. Currently, Pelagia has no support of Ceph Dashboard. Defaults to false.
rbdMirror Specifies the parameters for RBD Mirroring. For details, see RBD Mirroring parameters.
external Enables external Ceph cluster mode. If enabled, Pelagia will read a special Secret with external Ceph cluster credentials data connect to.

Network parameters #

  • clusterNet - specifies a Classless Inter-Domain Routing (CIDR) for the Ceph OSD replication network.

    Warning

    To avoid ambiguous behavior of Ceph daemons, do not specify 0.0.0.0/0 in clusterNet. Otherwise, Ceph daemons can select an incorrect public interface that can cause the Ceph cluster to become unavailable.

    Note

    The clusterNet and publicNet parameters support multiple IP networks. For details, see Ops Guide: Enable Multinetworking.

  • publicNet - specifies a CIDR for communication between the service and operator.

    Warning

    To avoid ambiguous behavior of Ceph daemons, do not specify 0.0.0.0/0 in publicNet. Otherwise, Ceph daemons can select an incorrect public interface that can cause the Ceph cluster to become unavailable.

    Note

    The clusterNet and publicNet parameters support multiple IP networks. For details, see Ops Guide: Enable Multinetworking.

Example configuration:

spec:
  network:
    clusterNet: 10.10.0.0/24
    publicNet:  192.100.0.0/24

Nodes parameters #

  • name - Mandatory. Specifies the following:

    • If node spec implies to be deployed on a single node, name stands for node name where Ceph node should be deployed. For example, it could be cluster-storage-worker-0.
    • If node spec implies to be deployed on a group of nodes, name stands for group name, for example group-rack-1. In that case, Ceph node specification must contain either nodeGroup or nodesByLabel fields defined.
  • nodeGroup - Optional. Specifies the list of nodes and used for specifying Ceph node specification for a group of nodes from the list. For example:

    spec:
      nodes:
      - name: group-1
        nodeGroup: [node-X, node-Y]
    
  • nodesByLabel - Optional. Specifies label expression and used for specifying Ceph node specification for a group of nodes found by label. For example:

    spec:
      nodes:
      - name: group-1
        nodesByLabel: "ceph-storage-node=true,!ceph-control-node"
    
  • roles - Optional. Specifies the mon, mgr, rgw or mds daemon to be installed on a Ceph node. You can place the daemons on any nodes upon your decision. Consider the following recommendations:

    • The recommended number of Ceph Monitors in a Ceph cluster is 3. Therefore, at least three Ceph nodes must contain the mon item in the roles parameter.
    • The number of Ceph Monitors must be odd.
    • Do not add more than three Ceph Monitors at a time and wait until the Ceph cluster is Ready before adding more daemons.
    • For better HA and fault tolerance, the number of mgr roles must equal the number of mon roles. Therefore, we recommend labeling at least three Ceph nodes with the mgr role.
    • If rgw roles are not specified, all rgw daemons will spawn on the same nodes with mon daemons.

    If a Ceph node contains a mon role, the Ceph Monitor Pod deploys on this node.

    If a Ceph node contains a mgr role, it informs the Ceph Controller that a Ceph Manager can be deployed on the node. Rook Operator selects the first available node to deploy the Ceph Manager on it. Pelagia supports deploying two Ceph Managers in total: one active and one stand-by.

    If you assign the mgr role to three recommended Ceph nodes, one back-up Ceph node is available to redeploy a failed Ceph Manager in a case of a node outage.

  • monitorIP - Optional. If defined, specifies a custom IP address for Ceph Monitor which should be placed on the node. If not set, Ceph Monitor on the node will use default hostNetwork address of a node. General recommendation is to use IP address from Ceph public network address range.

    Note

    To update monitorIP the corresponding Ceph Monitor daemon should be re-created.

  • config - Mandatory. Specifies a map of device configurations that must contain a mandatory deviceClass parameter set to hdd, ssd, or nvme. Applied for all OSDs on the current node. Can be overridden by the config parameter of each device defined in the devices parameter.

    For details, see Rook documentation: OSD config settings.

  • devices - Optional. Specifies the list of devices to use for Ceph OSD deployment. Includes the following parameters:

    Note

    Recommending to use the fullPath field for defining by-id symlinks as persistent device identifiers. For details, see Addressing Ceph devices.

    • fullPath - a storage device symlink. Accepts the following values:

      • The device by-id symlink that contains the serial number of the physical device and does not contain wwn. For example, /dev/disk/by-id/nvme-SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543.

      • The device by-path symlink. For example, /dev/disk/by-path/pci-0000:00:11.4-ata-3. We do not recommend specifying storage devices with device by-path symlinks because such identifiers are not persistent and can change at node boot.

      This parameter is mutually exclusive with name.

    • name - a storage device name. Accepts the following values:

      • The device name, for example, sdc. We do not recommend specifying storage devices with device names because such identifiers are not persistent and can change at node boot.
      • The device by-id symlink that contains the serial number of the physical device and does not contain wwn. For example, /dev/disk/by-id/nvme-SAMSUNG_MZ1LB3T8HMLA-00007_S46FNY0R394543.
      • The device label from extraOpts.deviceLabels section which is generally used for templating Ceph node specification for node groups. For details, see Extra options.

      This parameter is mutually exclusive with fullPath.

    • config - a map of device configurations that must contain a mandatory deviceClass parameter set to hdd, ssd, or nvme. The device class must be defined in a pool and can optionally contain a metadata device, for example:

      spec:
        nodes:
        - name: <node-a>
          devices:
          - fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
            config:
              deviceClass: hdd
              metadataDevice: /dev/meta-1/nvme-dev-1
              osdsPerDevice: "2"
      

      The underlying storage format to use for Ceph OSDs is BlueStore.

      The metadataDevice parameter accepts a device name or logical volume path for the BlueStore device. We recommend using logical volume paths created on nvme devices.

      The osdsPerDevice parameter accepts the string-type natural numbers and allows splitting one device on several Ceph OSD daemons. We recommend using this parameter only for ssd or nvme disks.

  • deviceFilter - Optional. Specifies regexp by names of devices to use for Ceph OSD deployment. Mutually exclusive with devices and devicePathFilter. Requires the config parameter with deviceClass specified. For example:

    spec:
      nodes:
      - name: <node-a>
        deviceFilter: "^sd[def]$"
        config:
          deviceClass: hdd
    

    For more details, see Rook documentation: Storage selection settings.

  • devicePathFilter - Optional. Specifies regexp by paths of devices to use for Ceph OSD deployment. Mutually exclusive with devices and deviceFilter. Requires the config parameter with deviceClass specified. For example:

    spec:
      nodes:
      - name: <node-a>
        devicePathFilter: "^/dev/disk/by-id/scsi-SATA.+$"
        config:
          deviceClass: hdd
    

    For more details, see Rook documentation: Storage selection settings.

  • crush - Optional. Specifies the explicit key-value CRUSH topology for a node. For details, see Ceph documentation: CRUSH maps. Includes the following parameters:

    • datacenter - a physical data center that consists of rooms and handles data.
    • room - a room that accommodates one or more racks with hosts.
    • pdu - a power distribution unit (PDU) device that has multiple outputs and distributes electric power to racks located within a data center.
    • row - a row of computing racks inside room.
    • rack - a computing rack that accommodates one or more hosts.
    • chassis - a bare metal structure that houses or physically assembles hosts.
    • region - the geographic location of one or more Ceph Object instances within one or more zones.
    • zone - a logical group that consists of one or more Ceph Object instances.

    Example configuration:

    spec:
      nodes:
      - name: <node-a>
        crush:
          datacenter: dc1
          room: room1
          pdu: pdu1
          row: row1
          rack: rack1
          chassis: ch1
          region: region1
          zone: zone1
    

Pools parameters #

  • name - Mandatory. Specifies the pool name as a prefix for each Ceph block pool. The resulting Ceph block pool name will be <name>-<deviceClass>.
  • useAsFullName - Optional. Enables Ceph block pool to use only the name value as a name. The resulting Ceph block pool name will be <name> without the deviceClass suffix.
  • role - Optional. Specifies the pool role for Rockoon integration.
  • preserveOnDelete - Optional. Enables skipping Ceph pool delete on pools section item removal. If pools section item removed with this flag enabled, related CephBlockPool object would be kept untouched and will require manual deletion on demand. Defaulted to false.
  • storageClassOpts - Optional. Allows to configure parameters for storage class, created for RBD pool. Includes the following parameters:

    • default - Optional. Defines whether the pool and dependent StorageClass must be set as default. Must be enabled only for one pool. Defaults to false.
    • mapOptions - Optional. Not updatable as it applies only once. Specifies custom rbd device map options to use with StorageClass of a corresponding pool. Allows customizing the Kubernetes CSI driver interaction with Ceph RBD for the defined StorageClass. For available options, see Ceph documentation: Kernel RBD (KRBD) options.
    • unmapOptions - Optional. Not updatable as it applies only once. Specifies custom rbd device unmap options to use with StorageClass of a corresponding pool. Allows customizing the Kubernetes CSI driver interaction with Ceph RBD for the defined StorageClass. For available options, see Ceph documentation: Kernel RBD (KRBD) options.
    • imageFeatures - Optional. Not updatable as it applies only once. Specifies is a comma-separated list of RBD image features, see Ceph documentation: Manage Rados block device (RBD) images.
    • reclaimPolicy - Optional. Specifies reclaim policy for the underlying StorageClass of the pool. Accepts Retain and Delete values. Default is Delete if not set.
    • allowVolumeExpansion - Optional. Not updatable as it applies only once. Enables expansion of persistent volumes based on StorageClass of a corresponding pool. For details, see Kubernetes documentation: Resizing persistent volumes using Kubernetes.

      Note

      A Kubernetes cluster only supports increase of storage size.

  • deviceClass - Mandatory. Specifies the device class for the defined pool. Common possible values are hdd, ssd and nvme. Also allows customized device classes, refers to Extra options.

  • replicated - The replicated parameter is mutually exclusive with erasureCoded and includes the following parameters:

    • size - the number of pool replicas.
    • targetSizeRatio - A float percentage from 0.0 to 1.0, which specifies the expected consumption of the total Ceph cluster capacity. The default values are as follows:

      Note

      Mirantis recommends defining target ratio with the parameters.target_size_ratio string field instead.

  • erasureCoded - Enables the erasure-coded pool. For details, see Rook documentation: Erasure Coded RBD Pool. and Ceph documentation: Erasure coded pool. The erasureCoded parameter is mutually exclusive with replicated.

  • failureDomain - Optional. The failure domain across which the replicas or chunks of data will be spread. Set to host by default. The list of possible recommended values includes: host, rack, room, and datacenter.

    Caution

    We do not recommend using the following intermediate topology keys: pdu, row, chassis. Consider the rack topology instead. The osd failure domain is prohibited.

  • mirroring - Optional. Enables the mirroring feature for the defined pool. Includes the mode parameter that can be set to pool or image. For details, see Ops Guide: Enable RBD Mirroring.

  • parameters - Optional. Specifies the key-value map for the parameters of the Ceph pool. For details, see Ceph documentation: Set Pool values.
  • enableCrushUpdates - Optional. Enables automatic updates of the CRUSH map when the pool is created or updated. Defaulted to false.
Example configuration of Pools specification
spec:
  pools:
  - name: kubernetes
    deviceClass: hdd
    replicated:
      size: 3
    parameters:
      target_size_ratio: "10.0"
    storageClassOpts:
      default: true
    preserveOnDelete: true
  - name: kubernetes
    deviceClass: nvme
    erasureCoded:
      codingChunks: 1
      dataChunks: 2
    failureDomain: host
  - name: archive
    useAsFullName: true
    deviceClass: hdd
    failureDomain: rack
    replicated:
      size: 3

As a result, the following Ceph pools will be created: kubernetes-hdd, kubernetes-nvme, and archive.

To configure additional required pools for Rockoon, see Ops Guide: Integrate Pelagia with Rockoon.

Caution

Since Ceph Pacific, Ceph CSI driver does not propagate the 777 permission on the mount point of persistent volumes based on any StorageClass of the Ceph pool.

Clients parameters #

Example configuration of Clients specification
spec:
  clients:
  - name: test-client
    caps:
      mon: allow r, allow command "osd blacklist"
      osd: profile rbd pool=kubernetes-nvme

RADOS Gateway parameters #

  • name - Required. Ceph Object Storage instance name.
  • dataPool - Required if zone.name is not specified. Mutually exclusive with zone. Must be used together with metadataPool.

    Object storage data pool spec that must only contain replicated or erasureCoded, deviceClass and failureDomain parameters. The failureDomain parameter may be set to host, rack, room, or datacenter, defining the failure domain across which the data will be spread. The deviceClass must be explicitly defined. For dataPool, We recommend using an erasureCoded pool.

    spec:
       objectStorage:
         rgw:
           dataPool:
             deviceClass: hdd
             failureDomain: host
             erasureCoded:
               codingChunks: 1
               dataChunks: 2
    
  • metadataPool - Required if zone.name is not specified. Mutually exclusive with zone. Must be used together with dataPool.

    Object storage metadata pool spec that must only contain replicated, deviceClass and failureDomain parameters. The failureDomain parameter may be set to host, rack, room, or datacenter, defining the failure domain across which the data will be spread. The deviceClass must be explicitly defined. Can use only replicated settings. For example:

    spec:
       objectStorage:
         rgw:
           metadataPool:
             deviceClass: hdd
             failureDomain: host
             replicated:
               size: 3
    

    where replicated.size is the number of full copies of data on multiple nodes.

    Warning

    When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specified replicated.size.

    For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. We recommend setting replicated.size to 3 for each Ceph pool.

  • gateway - Required. The gateway settings corresponding to the rgw daemon settings. Includes the following parameters:

    • port - the port on which the Ceph RGW service will be listening on HTTP.
    • securePort - the port on which the Ceph RGW service will be listening on HTTPS.
    • instances - the number of pods in the Ceph RGW ReplicaSet. If allNodes is set to true, a DaemonSet is created instead.

      Note

      We recommend using 3 instances for Ceph Object Storage.

    • allNodes - defines whether to start the Ceph RGW pods as a DaemonSet on all nodes. The instances parameter is ignored if allNodes is set to true.

    • splitDaemonForMultisiteTrafficSync - Optional. For multisite setup defines whether to split RGW daemon on daemon responsible for sync between zones and daemon for serving clients request.
    • rgwSyncPort - Optional. Port the rgw multisite traffic service will be listening on (http). Has effect only for multisite configuration.
    • resources - Optional. Represents Kubernetes resource requirements for Ceph RGW pods. For details see: Kubernetes docs: Resource Management for Pods and Containers.
    • externalRgwEndpoint - Required for external Ceph cluster Setup. Represents external RGW Endpoint to use, only when external Ceph cluster is used. Contains the following parameters:

      • ip - represents the IP address of RGW endpoint.
      • hostname - represents the DNS-addressable hostname of RGW endpoint. This field will be preferred over IP if both are given.
      spec:
        objectStorage:
          rgw:
            gateway:
              allNodes: false
              instances: 3
              port: 80
              securePort: 8443
      
  • preservePoolsOnDelete - Optional. Defines whether to delete the data and metadata pools in the rgw section if the Object Storage is deleted. Set this parameter to true if you need to store data even if the object storage is deleted. However, we recommend setting this parameter to false.

  • objectUsers and buckets - Optional. To create new Ceph RGW resources, such as buckets or users, specify the following keys. Ceph Controller will automatically create the specified object storage users and buckets in the Ceph cluster.

    • objectUsers - a list of user specifications to create for object storage. Contains the following fields:

      • name - a user name to create.
      • displayName - the Ceph user name to display.
      • capabilities - user capabilities:

        • user - admin capabilities to read/write Ceph Object Store users.
        • bucket - admin capabilities to read/write Ceph Object Store buckets.
        • metadata - admin capabilities to read/write Ceph Object Store metadata.
        • usage - admin capabilities to read/write Ceph Object Store usage.
        • zone - admin capabilities to read/write Ceph Object Store zones.

        The available options are *, read, write, read, write. For details, see Ceph documentation: Add/remove admin capabilities.

      • quotas - user quotas:

        • maxBuckets - the maximum bucket limit for the Ceph user. Integer, for example, 10.
        • maxSize - the maximum size limit of all objects across all the buckets of a user. String size, for example, 10G.
        • maxObjects - the maximum number of objects across all buckets of a user. Integer, for example, 10.
      spec:
        objectStorage:
          rgw:
            objectUsers:
            - name: test-user
               displayName: test-user
               capabilities:
                 bucket: '*'
                 metadata: read
                 user: read
               quotas:
                 maxBuckets: 10
                 maxSize: 10G
      
    • buckets - a list of strings that contain bucket names to create for object storage.

  • zone - Required if dataPool and metadataPool are not specified. Mutually exclusive with these parameters.

    Defines the Ceph Multisite zone where the object storage must be placed. Includes the name parameter that must be set to one of the zones items. For details, see the Ops Guide: Enable Multisite for Ceph Object Storage.

    spec:
      objectStorage:
        multisite:
          zones:
          - name: master-zone
            ...
        rgw:
          zone:
            name: master-zone
    
  • SSLCert - Optional. Custom TLS certificate parameters used to access the Ceph RGW endpoint. If not specified, a self-signed certificate will be generated.

    spec:
      objectStorage:
        rgw:
          SSLCert:
            cacert: |
              -----BEGIN CERTIFICATE-----
              ca-certificate here
              -----END CERTIFICATE-----
            tlsCert: |
              -----BEGIN CERTIFICATE-----
              private TLS certificate here
              -----END CERTIFICATE-----
            tlsKey: |
              -----BEGIN RSA PRIVATE KEY-----
              private TLS key here
              -----END RSA PRIVATE KEY-----
    
  • SSLCertInRef - Optional. Flag to determine that a TLS certificate for accessing the Ceph RGW endpoint is used but not exposed in spec. For example:

    spec:
      objectStorage:
        rgw:
          SSLCertInRef: true
    

    The operator must manually provide TLS configuration using the rgw-ssl-certificate secret in the rook-ceph namespace of the managed cluster. The secret object must have the following structure:

    data:
      cacert: <base64encodedCaCertificate>
      cert: <base64encodedCertificate>
    

    When removing an already existing SSLCert block, no additional actions are required, because this block uses the same rgw-ssl-certificate secret in the rook-ceph namespace.

    When adding a new secret directly without exposing it in spec, the following rules apply:

    • cert - base64 representation of a file with the server TLS key, server TLS cert, and CA certificate.
    • cacert - base64 representation of a CA certificate only.
Example configuration of RADOS gateway specification
spec:
  objectStorage:
    rgw:
      name: rgw-store
      dataPool:
        deviceClass: hdd
        erasureCoded:
          codingChunks: 1
          dataChunks: 2
        failureDomain: host
      metadataPool:
        deviceClass: hdd
        failureDomain: host
        replicated:
          size: 3
      gateway:
        allNodes: false
        instances: 3
        port: 80
        securePort: 8443
      preservePoolsOnDelete: false

RADOS Gateway Multisite parameters #

Technical Preview

  • realms - Required. List of realms to use, represents the realm namespaces. Includes the following parameters:

    • name - required, the realm name.
    • pullEndpoint - optional, required only when the master zone is in a different storage cluster. The endpoint, access key, and system key of the system user from the realm to pull from. Includes the following parameters:

      • endpoint - the endpoint of the master zone in the master zone group.
      • accessKey - the access key of the system user from the realm to pull from.
      • secretKey - the system key of the system user from the realm to pull from.
  • zoneGroups - Required. The list of zone groups for realms. Includes the following parameters:

    • name - required, the zone group name.
    • realmName - required, the realm namespace name to which the zone group belongs to.
  • zones - Required. The list of zones used within one zone group. Includes the following parameters:

    • name - required, the zone name.
    • metadataPool - required, the settings used to create the Object Storage metadata pools. Must use replication. For details, see description of Pool parameters.
    • dataPool - required, the settings used to create the Object Storage data pool. Can use replication or erasure coding. For details, see Pool parameters.
    • zoneGroupName - required, the zone group name.
    • endpointsForZone - optional. The list of all endpoints in the zone group. If you use ingress proxy for RGW, the list of endpoints must contain that FQDN/IP address to access RGW. By default, if no ingress proxy is used, the list of endpoints is set to the IP address of the RGW external service. Endpoints must follow the HTTP URL format.

For configuration example, see Ops Guide: Enable Multisite for Ceph Object Storage.

CephFS parameters #

sharedFilesystem contains a list of Ceph Filesystems cephFS. Each cephFS item contains the following parameters:

  • name - Mandatory. CephFS instance name.
  • dataPools - A list of CephFS data pool specifications. Each spec contains the name, replicated or erasureCoded, deviceClass, and failureDomain parameters. The first pool in the list is treated as the default data pool for CephFS and must always be replicated. The number of data pools is unlimited, but the default pool must always be present. For example:

    spec:
      sharedFilesystem:
        cephFS:
        - name: cephfs-store
          dataPools:
          - name: default-pool
            deviceClass: ssd
            replicated:
              size: 3
            failureDomain: host
          - name: second-pool
            deviceClass: hdd
            failureDomain: rack
            erasureCoded:
              dataChunks: 2
              codingChunks: 1
    

    where replicated.size is the number of full copies of data on multiple nodes.

    Warning

    When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimal replica size equals a rounded up half of the specified replicated.size.

    For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. We recommend setting replicated.size to 3 for each Ceph pool.

    Warning

    Modifying of dataPools on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in dataPools, we recommend re-creating CephFS.

  • metadataPool - CephFS metadata pool spec that should only contain replicated, deviceClass, and failureDomain parameters. Can use only replicated settings. For example:

    spec:
      sharedFilesystem:
        cephFS:
        - name: cephfs-store
          metadataPool:
            deviceClass: nvme
            replicated:
              size: 3
            failureDomain: host
    

    where replicated.size is the number of full copies of data on multiple nodes.

    Warning

    Modifying of metadataPool on a deployed CephFS has no effect. You can manually adjust pool settings through the Ceph CLI. However, for any changes in metadataPool, we recommend re-creating CephFS.

  • preserveFilesystemOnDelete - Defines whether to delete the data and metadata pools if CephFS is deleted. Set to true to avoid occasional data loss in case of human error. However, for security reasons, we recommend setting preserveFilesystemOnDelete to false.

  • metadataServer - Metadata Server settings correspond to the Ceph MDS daemon settings. Contains the following fields:

    • activeCount - the number of active Ceph MDS instances. As load increases, CephFS will automatically partition the file system across the Ceph MDS instances. Rook will create double the number of Ceph MDS instances as requested by activeCount. The extra instances will be in the standby mode for failover. We recommend specifying this parameter to 1 and increasing the MDS daemons count only in case of high load.
    • activeStandby - defines whether the extra Ceph MDS instances will be in active standby mode and will keep a warm cache of the file system metadata for faster failover. The instances will be assigned by CephFS in failover pairs. If false, the extra Ceph MDS instances will all be in passive standby mode and will not maintain a warm cache of the metadata. The default value is false.
    • resources - represents Kubernetes resource requirements for Ceph MDS pods. For details see: Kubernetes docs: Resource Management for Pods and Containers.
    spec:
      sharedFilesystem:
        cephFS:
        - name: cephfs-store
          metadataServer:
            activeCount: 1
            activeStandby: false
          resources: # example, non-prod values
            requests:
              memory: 1Gi
              cpu: 1
          limits:
            memory: 2Gi
            cpu: 2
    
Example configuration of shared Filesystem specification
spec:
  sharedFilesystem:
    cephFS:
    - name: cephfs-store
      dataPools:
      - name: cephfs-pool-1
        deviceClass: hdd
        replicated:
          size: 3
        failureDomain: host
      metadataPool:
        deviceClass: nvme
        replicated:
          size: 3
        failureDomain: host
      metadataServer:
        activeCount: 1
        activeStandby: false

RookConfig parameters #

String key-value parameter that allows overriding Ceph configuration options.

Use the | delimiter to specify the section where a parameter must be placed. For example, mon or osd. And, if required, use the . delimiter to specify the exact number of the Ceph OSD or Ceph Monitor to apply an option to a specific mon or osd and override the configuration of the corresponding section.

The use of this option enables restart of only specific daemons related to the corresponding section. If you do not specify the section, a parameter is set in the global section, which includes restart of all Ceph daemons except Ceph OSD.

spec:
  rookConfig:
    "osd_max_backfills": "64"
    "mon|mon_health_to_clog":  "true"
    "osd|osd_journal_size": "8192"
    "osd.14|osd_journal_size": "6250"

HealthCheck parameters #

  • daemonHealth - Optional. Specifies health check settings for Ceph daemons. Contains the following parameters:

    • status - configures health check settings for Ceph health
    • mon - configures health check settings for Ceph Monitors
    • osd - configures health check settings for Ceph OSDs

    Each parameter allows defining the following settings:

    • disabled - a flag that disables the health check.
    • interval - an interval in seconds or minutes for the health check to run. For example, 60s for 60 seconds.
    • timeout - a timeout for the health check in seconds or minutes. For example, 60s for 60 seconds.
  • livenessProbe - Optional. Key-value parameter with liveness probe settings for the defined daemon types. Can be one of the following: mgr, mon, osd, or mds. Includes the disabled flag and the probe parameter. The probe parameter accepts the following options:

    • initialDelaySeconds - the number of seconds after the container has started before the liveness probes are initiated. Integer.
    • timeoutSeconds - the number of seconds after which the probe times out. Integer.
    • periodSeconds - the frequency (in seconds) to perform the probe. Integer.
    • successThreshold - the minimum consecutive successful probes for the probe to be considered successful after a failure. Integer.
    • failureThreshold - the minimum consecutive failures for the probe to be considered failed after having succeeded. Integer.

    Note

    Pelagia Deployment Controller specifies the following livenessProbe defaults for mon, mgr, osd, and mds (if CephFS is enabled):

    • 5 for timeoutSeconds
    • 5 for failureThreshold
  • startupProbe - Optional. Key-value parameter with startup probe settings for the defined daemon types. Can be one of the following: mgr, mon, osd, or mds. Includes the disabled flag and the probe parameter. The probe parameter accepts the following options:

    • timeoutSeconds - the number of seconds after which the probe times out. Integer.
    • periodSeconds - the frequency (in seconds) to perform the probe. Integer.
    • successThreshold - the minimum consecutive successful probes for the probe to be considered successful after a failure. Integer.
    • failureThreshold - the minimum consecutive failures for the probe to be considered failed after having succeeded. Integer.
Example configuration of health check specification
spec:
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
        timeout: 600s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: true
    livenessProbe:
      mon:
        disabled: false
        probe:
          timeoutSeconds: 10
          periodSeconds: 3
          successThreshold: 3
      mgr:
        disabled: false
        probe:
          timeoutSeconds: 5
          failureThreshold: 5
      osd:
        probe:
          initialDelaySeconds: 5
          timeoutSeconds: 10
          failureThreshold: 7
    startupProbe:
      mon:
        disabled: true
      mgr:
        probe:
          successThreshold: 3

ExtraOpts parameters #

  • deviceLabels - Optional. A key-value mapping which is used to assign a specification label to any available device on a specific node. These labels can then be used for the nodes section items with nodeGroup or nodesByLabel defined to eliminate the need to specify different devices for each node individually. Additionally, it helps in avoiding the use of device names, facilitating the grouping of nodes with similar labels.

    Example usage:

    spec:
      extraOpts:
        deviceLabels:
          <node-name>:
            <dev-label>: /dev/disk/by-id/<unique_ID>
            ...
            <dev-label-n>: /dev/disk/by-id/<unique_ID>
          ...
          <node-name-n>:
            <dev-label>: /dev/disk/by-id/<unique_ID>
            ...
            <dev-label-n>: /dev/disk/by-id/<unique_ID>
      nodes:
      - name: <group-name>
        devices:
        - name: <dev_label>
        - name: <dev_label_n>
        nodes:
        - <node_name>
        - <node_name_n>
    
  • customDeviceClasses - Optional. TechPreview. A list of custom device class names to use in the specification. Enables you to specify the custom names different from the default ones, which include ssd, hdd, and nvme, and use them in nodes and pools definitions.

    Example usage:

    spec:
      extraOpts:
        customDeviceClasses:
        - <custom_class_name>
      nodes:
      - name: kaas-node-5bgk6
        devices:
          - config: # existing item
            deviceClass: <custom_class_name>
            fullPath: /dev/disk/by-id/<unique_ID>
      pools:
      - default: false
        deviceClass: <custom_class_name>
        erasureCoded:
        codingChunks: 1
        dataChunks: 2
        failureDomain: host
    

Manager modules parameters #

CephDeployment specification mgr section contains mgrModules parameter. It includes the following parameters:

  • name - Ceph Manager module name.
  • enabled - Flag that defines whether the Ceph Manager module is enabled.

    For example:

    spec:
      mgr:
        mgrModules:
        - name: balancer
          enabled: true
        - name: pg_autoscaler
          enabled: true
    

    The balancer and pg_autoscaler Ceph Manager modules are enabled by default and cannot be disabled.

Note

Most Ceph Manager modules require additional configuration that you can perform through the pelagia-lcm-tooblox pod.

RBD Mirroring parameters #

  • daemonsCount - Count of rbd-mirror daemons to spawn. We recommend using one instance of the rbd-mirror daemon. |
  • peers - Optional. List of mirroring peers of an external cluster to connect to. Only a single peer is supported. The peer section includes the following parameters:

    • site - the label of a remote Ceph cluster associated with the token.
    • token - the token that will be used by one site (Ceph cluster) to pull images from the other site. To obtain the token, use the rbd mirror pool peer bootstrap create command.
    • pools - optional, a list of pool names to mirror.

Status fields #

Field Description
phase Current handling phase of the applied Ceph cluster spec. Can equal to Creating, Deploying, Validation, Ready, Deleting, OnHold or Failed.
message Detailed description of the current phase or an error message if the phase is Failed.
lastRun DateTime when previous spec reconcile occurred.
clusterVersion Current Ceph cluster version, for example, v19.2.3.
validation Validation result (Succeed or Failed) of the spec with a list of messages, if any. The validation section includes the following fields:
  • result - Succeed or Failed
  • messages - the list of error messages
  • lastValidatedGeneration - the last validated metadata.generation of CephDeployment
objRefs Pelagia API object refereneces such as CephDeploymentHealth and CephDeploymentSecret.