Skip to content

CephDeploymentHealth custom resource#

Verifying Ceph cluster state is an entry point for issues investigation. CephDeploymentHealth (cephdeploymenthealths.lcm.mirantis.com) custom resource (CR) allows you to verify the current health of a Ceph cluster and identify potentially problematic components. To obtain the detailed status for a particular Rook resources and Ceph cluster, which Pelagia manages:

kubectl -n pelagia get cephdeploymenthealth -o yaml
Example CephDeploymentHealth status
apiVersion: v1
items:
- apiVersion: lcm.mirantis.com/v1alpha1
  kind: CephDeploymentHealth
  metadata:
    name: pelagia-ceph
    namespace: pelagia
  status:
    healthReport:
      cephDaemons:
        cephCSIPluginDaemons:
          csi-cephfsplugin:
            info:
            - 3/3 ready
            status: ok
          csi-rbdplugin:
            info:
            - 3/3 ready
            status: ok
        cephDaemons:
          mds:
            info:
            - 'mds active: 1/1 (cephfs ''cephfs-store'')'
            status: ok
          mgr:
            info:
            - 'a is active mgr, standbys: [b]'
            status: ok
          mon:
            info:
            - 3 mons, quorum [a b c]
            status: ok
          osd:
            info:
            - 3 osds, 3 up, 3 in
            status: ok
          rgw:
            info:
            - '2 rgws running, daemons: [21273 38213]'
            status: ok
      clusterDetails:
        cephEvents:
          PgAutoscalerDetails:
            state: Idle
          rebalanceDetails:
            state: Idle
        rgwInfo:
          publicEndpoint: https://192.10.1.101:443
        usageDetails:
          deviceClasses:
            hdd:
              availableBytes: "159676964864"
              totalBytes: "161048690688"
              usedBytes: "1371725824"
          pools:
            .mgr:
              availableBytes: "75660169216"
              totalBytes: "75661557760"
              usedBytes: "1388544"
              usedBytesPercentage: "0.001"
            .rgw.root:
              availableBytes: "75661426688"
              totalBytes: "75661557760"
              usedBytes: "131072"
              usedBytesPercentage: "0.000"
            cephfs-store-cephfs-pool-1:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            cephfs-store-metadata:
              availableBytes: "75660517376"
              totalBytes: "75661557760"
              usedBytes: "1040384"
              usedBytesPercentage: "0.001"
            kubernetes-hdd:
              availableBytes: "75661549568"
              totalBytes: "75661557760"
              usedBytes: "8192"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.buckets.data:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.buckets.index:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.buckets.non-ec:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.control:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.log:
              availableBytes: "75660230656"
              totalBytes: "75661557760"
              usedBytes: "1327104"
              usedBytesPercentage: "0.001"
            rgw-store.rgw.meta:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.otp:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
      osdAnalysis:
        cephClusterSpecGeneration: 1
        diskDaemon:
          info:
          - 3/3 ready
          status: ok
        specAnalysis:
          cluster-storage-worker-0:
            status: ok
          cluster-storage-worker-1:
            status: ok
          cluster-storage-worker-2:
            status: ok
      rookCephObjects:
        blockStorage:
          cephBlockPools:
            builtin-mgr:
              info:
                failureDomain: host
                type: Replicated
              observedGeneration: 1
              phase: Ready
              poolID: 11
            builtin-rgw-root:
              info:
                failureDomain: host
                type: Replicated
              observedGeneration: 1
              phase: Ready
              poolID: 1
            kubernetes-hdd:
              info:
                failureDomain: host
                type: Replicated
              observedGeneration: 1
              phase: Ready
              poolID: 10
        cephCluster:
          ceph:
            capacity:
              bytesAvailable: 159676964864
              bytesTotal: 161048690688
              bytesUsed: 1371725824
              lastUpdated: "2025-08-15T12:10:39Z"
            fsid: 92d56f80-b7a8-4a35-80ef-eb6a877c2a73
            health: HEALTH_OK
            lastChanged: "2025-08-14T14:07:43Z"
            lastChecked: "2025-08-15T12:10:39Z"
            previousHealth: HEALTH_WARN
            versions:
              mds:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 2
              mgr:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 2
              mon:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 3
              osd:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 3
              overall:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 12
              rgw:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 2
          conditions:
          - lastHeartbeatTime: "2025-08-15T12:10:40Z"
            lastTransitionTime: "2025-08-12T09:35:27Z"
            message: Cluster created successfully
            reason: ClusterCreated
            status: "True"
            type: Ready
          message: Cluster created successfully
          observedGeneration: 1
          phase: Ready
          state: Created
          storage:
            deviceClasses:
            - name: hdd
            osd:
              migrationStatus: {}
              storeType:
                bluestore: 3
          version:
            image: 127.0.0.1/ceph/ceph:v19.2.3
            version: 19.2.3-0
        objectStorage:
          cephObjectStore:
            rgw-store:
              endpoints:
                insecure:
                - http://rook-ceph-rgw-rgw-store.rook-ceph.svc:8081
                secure:
                - https://rook-ceph-rgw-rgw-store.rook-ceph.svc:8443
              info:
                endpoint: http://rook-ceph-rgw-rgw-store.rook-ceph.svc:8081
                secureEndpoint: https://rook-ceph-rgw-rgw-store.rook-ceph.svc:8443
              observedGeneration: 1
              phase: Ready
        sharedFilesystem:
          cephFilesystems:
            cephfs-store:
              observedGeneration: 1
              phase: Ready
      rookOperator:
        status: ok
    lastHealthCheck: "2025-08-15T12:11:00Z"
    lastHealthUpdate: "2025-08-15T12:11:00Z"
    state: Ok
kind: List
metadata:
  resourceVersion: ""

High-level status fields #

  • healthReport - Complete information about Ceph cluster including cluster, Ceph resources, and daemon health. It helps reveal potentially problematic components.
  • lastHealthCheck - DateTime when previous cluster state check occurred.
  • lastHealthUpdate - DateTime when previous cluster state update occurred.
  • issues - List of strings of all issues found during cluster state check.
  • state - Cluster state that can be Ok or Failed depending on the Ceph cluster state check.

Health report status fields #

  • rookOperator - State of the Rook Ceph Operator pod which contains the following fields:

    • status - contains short state of current Rook Ceph Operator pod;
    • issues - represents found Rook Operator issues, otherwise it is empty;
  • rookCephObjects - General information from Rook about the Ceph cluster health and current state. Contains the following fields:

    • cephCluster - Contains Ceph cluster status information.
    • blockStorage - Contains status of block-storage related objects status information.
    • cephClients - Represents a key-value mapping of Ceph client's name and its status.
    • objectStorage - Contains status of object-storage related objects status information.
    • sharedFilesystems - Contains status of shared filesystems related objects status information.
    Example rookCephObjects status
    status:
      healthReport:
        rookCephObjects:
          cephCluster:
            state: <rook ceph cluster common status>
            phase: <rook ceph cluster spec reconcile phase>
            message: <rook ceph cluster phase details>
            conditions: <history of rook ceph cluster reconcile steps>
            ceph: <ceph cluster health>
            storage:
              deviceClasses: <list of used device classes in ceph cluster>
            version:
              image: <ceph image used in ceph cluster>
              version: <ceph version of ceph cluster>
        blockStorage:
          cephBlockPools:
            <cephBlockPoolName>:
              ...
              phase: <rook ceph block pool resource phase>
        cephClients:
          <cephClientName>:
            ...
            phase: <rook ceph client resource phase>
        objectStorage:
          cephObjectStore:
            <cephObjectStoreName>:
              ...
              phase: <rook ceph object store resource phase>
          cephObjectStoreUsers:
            <rgwUserName>:
              ...
              phase: <rook ceph object store user resource phase>
          objectBucketClaims:
            <bucketName>:
              ...
              phase: <rook ceph object bucket claims resource phase>
          cephObjectRealms:
            <realmName>:
              ...
              phase: <rook ceph object store realm resource phase>
          cephObjectZoneGroups:
            <zonegroupName>:
              ...
              phase: <rook ceph object store zonegroup resource phase>
          cephObjectZones:
            <zoneName>:
              ...
              phase: <rook ceph object store zone resource phase>
        sharedFilesystems:
          cephFilesystems:
            <cephFSName>:
              ...
              phase: <rook ceph filesystem resource phase>
    
  • cephDaemons - Contains information about the state of the Ceph and Ceph CSI daemons in the cluster. Includes the following fields:

    • cephDaemons - Map of statuses for each Ceph cluster daemon type. Indicates the expected and actual number of Ceph daemons on the cluster. Available daemon types are: mgr, mon, osd, and rgw.
    • cephCSIPluginDaemons - Contains information, similar to the daemonsStatus format, for each Ceph CSI plugin deployed in the Ceph cluster: rbd and cephfs.
    Example cephDaemons status
    status:
      healthReport:
        cephDaemons:
          cephCSIPluginDaemons:
            csi-cephfsplugin:
              info:
              - 3/3 ready
              status: ok
            csi-rbdplugin:
              info:
              - 3/3 ready
              status: ok
          cephDaemons:
            mds:
              info:
              - 'mds active: 1/1 (cephfs ''cephfs-store'')'
              status: ok
            mgr:
              info:
              - 'a is active mgr, standbys: [b]'
              status: ok
            mon:
              info:
              - 3 mons, quorum [a b c]
              status: ok
            osd:
              info:
              - 3 osds, 3 up, 3 in
              status: ok
            rgw:
              info:
              - '2 rgws running, daemons: [21273 38213]'
              status: ok
    
  • clusterDetails - Verbose details of the Ceph cluster state. Contains the following fields:

    • usageDetails - Describes the used, available, and total storage size for each deviceClass and pool.
    • cephEvents - Contains info about current ceph events happen in Ceph cluster if progress events module is enabled.
    • rgwInfo - represents additional Ceph Object Storage Multisite information like public endpoint to connect external zone and sync statuses.
    Example clusterDetails status
    status:
      healthReport:
        clusterDetails:
          cephEvents:
            PgAutoscalerDetails:
              state: Idle
            rebalanceDetails:
              state: Idle
          rgwInfo:
            publicEndpoint: https://192.10.1.101:443
          usageDetails:
            deviceClasses:
              hdd:
                availableBytes: "159681224704"
                totalBytes: "161048690688"
                usedBytes: "1367465984"
            pools:
              .mgr:
                availableBytes: "75660169216"
                totalBytes: "75661557760"
                usedBytes: "1388544"
                usedBytesPercentage: "0.001"
              .rgw.root:
                availableBytes: "75661426688"
                totalBytes: "75661557760"
                usedBytes: "131072"
                usedBytesPercentage: "0.000"
              cephfs-store-cephfs-pool-1:
                availableBytes: "75661557760"
                totalBytes: "75661557760"
                usedBytes: "0"
                usedBytesPercentage: "0.000"
              cephfs-store-metadata:
                availableBytes: "75660517376"
                totalBytes: "75661557760"
                usedBytes: "1040384"
                usedBytesPercentage: "0.001"
              kubernetes-hdd:
                availableBytes: "75661549568"
                totalBytes: "75661557760"
                usedBytes: "8192"
                usedBytesPercentage: "0.000"
              rgw-store.rgw.buckets.data:
                availableBytes: "75661557760"
                totalBytes: "75661557760"
                usedBytes: "0"
                usedBytesPercentage: "0.000"
              ...
              rgw-store.rgw.otp:
                availableBytes: "75661557760"
                totalBytes: "75661557760"
                usedBytes: "0"
                usedBytesPercentage: "0.000"
    
  • osdAnalysis - Ceph OSD analysis results based on Rook CephCluster specification and disk-daemon reports. Contains the following fields:

    • diskDaemon - Disk daemon status. Disk daemon is Pelagia LCM component that provides information about nodes' devices and their usage by Ceph OSDs.
    • cephClusterSpecGeneration - Last validated Rook CephCluster specification generation.
    • specAnalysis - Map of per-node analysis results based on the Rook CephCluster specification.
    Example osdAnalysis status
    status:
      healthReport:
        osdAnalysis:
          cephClusterSpecGeneration: 1
          diskDaemon:
            info:
            - 3/3 ready
            status: ok
          specAnalysis:
            cluster-storage-worker-0:
              status: ok
            cluster-storage-worker-1:
              status: ok
            cluster-storage-worker-2:
              status: ok