Verify Pelagia Controllers and Rook Ceph Operator#

The starting point for Pelagia, Rook and Ceph troubleshooting is the Pelagia Controllers and Rook Ceph Operator logs. Once you locate the component that causes issues, verify the logs of the related pod. This section describes how to verify Pelagia Controllers and Rook objects of a Ceph cluster.

Verify Pelagia and Rook#

Verify that the status of each pod in the Pelagia and Rook namespaces is Running:
- For pelagia:
```
kubectl -n pelagia get pod
```
- For rook-ceph:
```
kubectl -n rook-ceph get pod
```
Verify Pelagia Deployment Controller that prepares the configuration for Rook to deploy the Ceph cluster, which is managed using the CephDeployment custom resource (CR).
1. List the pods:
```
kubectl -n pelagia get pods
```
2. Verify the logs of the required pod:
```
kubectl -n pelagia logs <pelagia-deployment-controller-pod-name>
```
3. Verify the configuration:
```
kubectl -n pelagia get cephdpl -o yaml
```
If Rook cannot finish the deployment, verify the Rook Operator logs as described in the following step.
Verify the Rook Ceph Operator logs. Rook deploys a Ceph cluster based on custom resources created by the Pelagia Deployment Controller, such as cephblockpools, cephclients, cephcluster, and so on. Rook Ceph Operator logs contain details about component orchestration.
1. Verify the Rook Ceph Operator logs:
```
kubectl -n rook-ceph logs -l app=rook-ceph-operator
```
2. Verify the CephCluster configuration:
  
  Note
  
  In Pelagia, CephDeployment manages the CephCluster CR. Use the CephCluster CR only for verification and do not modify it manually.
```
kubectl get cephcluster -n rook-ceph -o yaml
```
For details about the Ceph cluster status and to get access to CLI tools, connect to the pelagia-ceph-toolbox pod as described in the following step.
Verify the pelagia-ceph-toolbox pod:
1. Execute the pelagia-ceph-toolbox pod:
```
kubectl -n rook-ceph exec -it deploy/pelagia-ceph-toolbox -- bash
```
2. Verify that CLI commands can run on the pelagia-ceph-toolbox pod:
```
ceph -s
```
Verify hardware:
1. Through the pelagia-ceph-toolbox pod, obtain the required device in your cluster:
```
ceph osd tree
```
2. Enter all Ceph OSD pods in the rook-ceph namespace one by one:
```
kubectl exec -it -n rook-ceph <osd-pod-name> bash
```
3. Verify that the ceph-volume tool is available on all pods running on the target node:
```
ceph-volume lvm list
```

Verify data access. Ceph volumes can be consumed directly by Kubernetes workloads and internally, for example, by OpenStack services. To verify the Kubernetes storage:

Verify the available storage classes. The storage classes that are automatically managed by Ceph Controller use the rook-ceph.rbd.csi.ceph.com provisioner.

kubectl get storageclass

Example of system response:

NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
kubernetes-ssd (default)        rook-ceph.rbd.csi.ceph.com     Delete          Immediate              false                  55m

Verify that volumes are properly connected to the Pod:

Obtain the list of volumes in all namespaces or use a particular one:

kubectl get persistentvolumeclaims -A

Example of system response:

NAMESPACE   NAME       STATUS   VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS     AGE
rook-ceph   app-test   Bound    pv-test   1Gi        RWO            kubernetes-ssd   11m

For each volume, verify the connection. For example:

kubectl describe pvc app-test -n rook-ceph

Example of a positive system response:

Name:          app-test
Namespace:     kaas
StorageClass:  rook-ceph
Status:        Bound
Volume:        pv-test
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Events:        <none>

In case of connection issues, inspect the Pod description for the volume information:

kubectl describe pod <crashloopbackoff-pod-name>

Example of system response:

...
Events:
  FirstSeen LastSeen Count From    SubObjectPath Type     Reason           Message
  --------- -------- ----- ----    ------------- -------- ------           -------
  1h        1h       3     default-scheduler     Warning  FailedScheduling PersistentVolumeClaim is not bound: "app-test" (repeated 2 times)
  1h        35s      36    kubelet, 172.17.8.101 Warning  FailedMount      Unable to mount volumes for pod "wordpress-mysql-918363043-50pjr_default(08d14e75-bd99-11e7-bc4c-001c428b9fc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-918363043-50pjr". list of unattached/unmounted volumes=[mysql-persistent-storage]
  1h        35s      36    kubelet, 172.17.8.101 Warning  FailedSync       Error syncing pod

Verify that the CSI provisioner plugins started properly and are in the Running status:
1. Obtain the list of CSI provisioner plugins:
```
kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
```
2. Verify the logs of the required CSI provisioner:
```
kubectl logs -n rook-ceph <csi-provisioner-plugin-name> csi-provisioner
```