CephDeploymentHealth Custom Resource#

Verifying Ceph cluster state is an entry point for issues investigation. CephDeploymentHealth (cephdeploymenthealths.lcm.mirantis.com) custom resource (CR) allows you to verify the current health of a Ceph cluster and identify potentially problematic components. To obtain the detailed status for a particular Rook resources and Ceph cluster, which Pelagia manages:

kubectl -n pelagia get cephdeploymenthealth -o yaml

Example output:

Example CephDeploymentHealth status

apiVersion: v1
items:
- apiVersion: lcm.mirantis.com/v1alpha1
  kind: CephDeploymentHealth
  metadata:
    name: pelagia-ceph
    namespace: pelagia
  status:
    healthReport:
      cephDaemons:
        cephCSIPluginDaemons:
          csi-cephfsplugin:
            info:
            - 3/3 ready
            status: ok
          csi-rbdplugin:
            info:
            - 3/3 ready
            status: ok
        cephDaemons:
          mds:
            info:
            - 'mds active: 1/1 (cephfs ''cephfs-store'')'
            status: ok
          mgr:
            info:
            - 'a is active mgr, standbys: [b]'
            status: ok
          mon:
            info:
            - 3 mons, quorum [a b c]
            status: ok
          osd:
            info:
            - 3 osds, 3 up, 3 in
            status: ok
          rgw:
            info:
            - '2 rgws running, daemons: [21273 38213]'
            status: ok
      clusterDetails:
        cephEvents:
          PgAutoscalerDetails:
            state: Idle
          rebalanceDetails:
            state: Idle
        rgwInfo:
          publicEndpoint: https://192.10.1.101:443
        usageDetails:
          deviceClasses:
            hdd:
              availableBytes: "159676964864"
              totalBytes: "161048690688"
              usedBytes: "1371725824"
          pools:
            .mgr:
              availableBytes: "75660169216"
              totalBytes: "75661557760"
              usedBytes: "1388544"
              usedBytesPercentage: "0.001"
            .rgw.root:
              availableBytes: "75661426688"
              totalBytes: "75661557760"
              usedBytes: "131072"
              usedBytesPercentage: "0.000"
            cephfs-store-cephfs-pool-1:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            cephfs-store-metadata:
              availableBytes: "75660517376"
              totalBytes: "75661557760"
              usedBytes: "1040384"
              usedBytesPercentage: "0.001"
            kubernetes-hdd:
              availableBytes: "75661549568"
              totalBytes: "75661557760"
              usedBytes: "8192"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.buckets.data:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.buckets.index:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.buckets.non-ec:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.control:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.log:
              availableBytes: "75660230656"
              totalBytes: "75661557760"
              usedBytes: "1327104"
              usedBytesPercentage: "0.001"
            rgw-store.rgw.meta:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
            rgw-store.rgw.otp:
              availableBytes: "75661557760"
              totalBytes: "75661557760"
              usedBytes: "0"
              usedBytesPercentage: "0.000"
      osdAnalysis:
        cephClusterSpecGeneration: 1
        diskDaemon:
          info:
          - 3/3 ready
          status: ok
        specAnalysis:
          cluster-storage-worker-0:
            status: ok
          cluster-storage-worker-1:
            status: ok
          cluster-storage-worker-2:
            status: ok
      rookCephObjects:
        blockStorage:
          cephBlockPools:
            builtin-mgr:
              info:
                failureDomain: host
                type: Replicated
              observedGeneration: 1
              phase: Ready
              poolID: 11
            builtin-rgw-root:
              info:
                failureDomain: host
                type: Replicated
              observedGeneration: 1
              phase: Ready
              poolID: 1
            kubernetes-hdd:
              info:
                failureDomain: host
                type: Replicated
              observedGeneration: 1
              phase: Ready
              poolID: 10
        cephCluster:
          ceph:
            capacity:
              bytesAvailable: 159676964864
              bytesTotal: 161048690688
              bytesUsed: 1371725824
              lastUpdated: "2025-08-15T12:10:39Z"
            fsid: 92d56f80-b7a8-4a35-80ef-eb6a877c2a73
            health: HEALTH_OK
            lastChanged: "2025-08-14T14:07:43Z"
            lastChecked: "2025-08-15T12:10:39Z"
            previousHealth: HEALTH_WARN
            versions:
              mds:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 2
              mgr:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 2
              mon:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 3
              osd:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 3
              overall:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 12
              rgw:
                ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable): 2
          conditions:
          - lastHeartbeatTime: "2025-08-15T12:10:40Z"
            lastTransitionTime: "2025-08-12T09:35:27Z"
            message: Cluster created successfully
            reason: ClusterCreated
            status: "True"
            type: Ready
          message: Cluster created successfully
          observedGeneration: 1
          phase: Ready
          state: Created
          storage:
            deviceClasses:
            - name: hdd
            osd:
              migrationStatus: {}
              storeType:
                bluestore: 3
          version:
            image: 127.0.0.1/ceph/ceph:v19.2.3
            version: 19.2.3-0
        objectStorage:
          cephObjectStore:
            rgw-store:
              endpoints:
                insecure:
                - http://rook-ceph-rgw-rgw-store.rook-ceph.svc:8081
                secure:
                - https://rook-ceph-rgw-rgw-store.rook-ceph.svc:8443
              info:
                endpoint: http://rook-ceph-rgw-rgw-store.rook-ceph.svc:8081
                secureEndpoint: https://rook-ceph-rgw-rgw-store.rook-ceph.svc:8443
              observedGeneration: 1
              phase: Ready
        sharedFilesystem:
          cephFilesystems:
            cephfs-store:
              observedGeneration: 1
              phase: Ready
      rookOperator:
        status: ok
    lastHealthCheck: "2025-08-15T12:11:00Z"
    lastHealthUpdate: "2025-08-15T12:11:00Z"
    state: Ok
kind: List
metadata:
  resourceVersion: ""

To understand the status of a CephDeploymentHealth, learn the following:

High-level status fields
Health report status fields

High-level status fields #

Field	Description
`healthReport`	Complete information about Ceph cluster including cluster, Ceph resources, and daemon health. It helps reveal potentially problematic components.
`lastHealthCheck`	`DateTime` when previous cluster state check occurred.
`lastHealthUpdate`	`DateTime` when previous cluster state update occurred.
`issues`	List of strings of all issues found during cluster state check.
`state`	Cluster state that can be `Ok` or `Failed` depending on the Ceph cluster state check.

Health report status fields #

rookOperator - State of the Rook Ceph Operator pod which contains the following fields:
- status - contains short state of current Rook Ceph Operator pod;
- issues - represents found Rook Operator issues, otherwise it is empty;

rookCephObjects - General information from Rook about the Ceph cluster health and current state. Contains the following fields:

cephCluster - Contains Ceph cluster status information.
blockStorage - Contains status of block-storage related objects status information.
cephClients - Represents a key-value mapping of Ceph client's name and its status.
objectStorage - Contains status of object-storage related objects status information.
sharedFilesystems - Contains status of shared filesystems related objects status information.

Example rookCephObjects status

status:
  healthReport:
    rookCephObjects:
      cephCluster:
        state: <rook ceph cluster common status>
        phase: <rook ceph cluster spec reconcile phase>
        message: <rook ceph cluster phase details>
        conditions: <history of rook ceph cluster reconcile steps>
        ceph: <ceph cluster health>
        storage:
          deviceClasses: <list of used device classes in ceph cluster>
        version:
          image: <ceph image used in ceph cluster>
          version: <ceph version of ceph cluster>
    blockStorage:
      cephBlockPools:
        <cephBlockPoolName>:
          ...
          phase: <rook ceph block pool resource phase>
    cephClients:
      <cephClientName>:
        ...
        phase: <rook ceph client resource phase>
    objectStorage:
      cephObjectStore:
        <cephObjectStoreName>:
          ...
          phase: <rook ceph object store resource phase>
      cephObjectStoreUsers:
        <rgwUserName>:
          ...
          phase: <rook ceph object store user resource phase>
      objectBucketClaims:
        <bucketName>:
          ...
          phase: <rook ceph object bucket claims resource phase>
      cephObjectRealms:
        <realmName>:
          ...
          phase: <rook ceph object store realm resource phase>
      cephObjectZoneGroups:
        <zonegroupName>:
          ...
          phase: <rook ceph object store zonegroup resource phase>
      cephObjectZones:
        <zoneName>:
          ...
          phase: <rook ceph object store zone resource phase>
    sharedFilesystems:
      cephFilesystems:
        <cephFSName>:
          ...
          phase: <rook ceph filesystem resource phase>

cephDaemons - Contains information about the state of the Ceph and Ceph CSI daemons in the cluster. Includes the following fields:

cephDaemons - Map of statuses for each Ceph cluster daemon type. Indicates the expected and actual number of Ceph daemons on the cluster. Available daemon types are: mgr, mon, osd, and rgw.
cephCSIPluginDaemons - Contains information, similar to the daemonsStatus format, for each Ceph CSI plugin deployed in the Ceph cluster: rbd and cephfs.

Example cephDaemons status

status:
  healthReport:
    cephDaemons:
      cephCSIPluginDaemons:
        csi-cephfsplugin:
          info:
          - 3/3 ready
          status: ok
        csi-rbdplugin:
          info:
          - 3/3 ready
          status: ok
      cephDaemons:
        mds:
          info:
          - 'mds active: 1/1 (cephfs ''cephfs-store'')'
          status: ok
        mgr:
          info:
          - 'a is active mgr, standbys: [b]'
          status: ok
        mon:
          info:
          - 3 mons, quorum [a b c]
          status: ok
        osd:
          info:
          - 3 osds, 3 up, 3 in
          status: ok
        rgw:
          info:
          - '2 rgws running, daemons: [21273 38213]'
          status: ok

clusterDetails - Verbose details of the Ceph cluster state. Contains the following fields:

usageDetails - Describes the used, available, and total storage size for each deviceClass and pool.
cephEvents - Contains info about current ceph events happen in Ceph cluster if progress events module is enabled.
rgwInfo - represents additional Ceph Object Storage Multisite information like public endpoint to connect external zone and sync statuses.

Example clusterDetails status

status:
  healthReport:
    clusterDetails:
      cephEvents:
        PgAutoscalerDetails:
          state: Idle
        rebalanceDetails:
          state: Idle
      rgwInfo:
        publicEndpoint: https://192.10.1.101:443
      usageDetails:
        deviceClasses:
          hdd:
            availableBytes: "159681224704"
            totalBytes: "161048690688"
            usedBytes: "1367465984"
        pools:
          .mgr:
            availableBytes: "75660169216"
            totalBytes: "75661557760"
            usedBytes: "1388544"
            usedBytesPercentage: "0.001"
          .rgw.root:
            availableBytes: "75661426688"
            totalBytes: "75661557760"
            usedBytes: "131072"
            usedBytesPercentage: "0.000"
          cephfs-store-cephfs-pool-1:
            availableBytes: "75661557760"
            totalBytes: "75661557760"
            usedBytes: "0"
            usedBytesPercentage: "0.000"
          cephfs-store-metadata:
            availableBytes: "75660517376"
            totalBytes: "75661557760"
            usedBytes: "1040384"
            usedBytesPercentage: "0.001"
          kubernetes-hdd:
            availableBytes: "75661549568"
            totalBytes: "75661557760"
            usedBytes: "8192"
            usedBytesPercentage: "0.000"
          rgw-store.rgw.buckets.data:
            availableBytes: "75661557760"
            totalBytes: "75661557760"
            usedBytes: "0"
            usedBytesPercentage: "0.000"
          ...
          rgw-store.rgw.otp:
            availableBytes: "75661557760"
            totalBytes: "75661557760"
            usedBytes: "0"
            usedBytesPercentage: "0.000"

osdAnalysis - Ceph OSD analysis results based on Rook CephCluster specification and disk-daemon reports. Contains the following fields:
- diskDaemon - Disk daemon status. Disk daemon is Pelagia LCM component that provides information about nodes' devices and their usage by Ceph OSDs.
- cephClusterSpecGeneration - Last validated Rook CephCluster specification generation.
- specAnalysis - Map of per-node analysis results based on the Rook CephCluster specification.
Example osdAnalysis status
```
status:
  healthReport:
    osdAnalysis:
      cephClusterSpecGeneration: 1
      diskDaemon:
        info:
        - 3/3 ready
        status: ok
      specAnalysis:
        cluster-storage-worker-0:
          status: ok
        cluster-storage-worker-1:
          status: ok
        cluster-storage-worker-2:
          status: ok
```