Verify Pelagia Controllers and Rook Ceph Operator#
The starting point for Pelagia, Rook and Ceph troubleshooting is the Pelagia Controllers and Rook Ceph Operator logs. Once you locate the component that causes issues, verify the logs of the related pod. This section describes how to verify Pelagia Controllers and Rook objects of a Ceph cluster.
Verify Pelagia and Rook#
-
Verify that the status of each pod in the Pelagia and Rook namespaces is
Running
:- For
pelagia
:kubectl -n pelagia get pod
- For
rook-ceph
:kubectl -n rook-ceph get pod
- For
-
Verify Pelagia Deployment Controller that prepares the configuration for Rook to deploy the Ceph cluster, which is managed using the
CephDeployment
custom resource (CR).-
List the pods:
kubectl -n pelagia get pods
-
Verify the logs of the required pod:
kubectl -n pelagia logs <pelagia-deployment-controller-pod-name>
-
Verify the configuration:
kubectl -n pelagia get cephdpl -o yaml
If Rook cannot finish the deployment, verify the Rook Operator logs as described in the following step.
-
-
Verify the Rook Ceph Operator logs. Rook deploys a Ceph cluster based on custom resources created by the Pelagia Deployment Controller, such as
cephblockpools
,cephclients
,cephcluster
, and so on. Rook Ceph Operator logs contain details about component orchestration.-
Verify the Rook Ceph Operator logs:
kubectl -n rook-ceph logs -l app=rook-ceph-operator
-
Verify the
CephCluster
configuration:Note
In Pelagia,
CephDeployment
manages theCephCluster
CR. Use theCephCluster
CR only for verification and do not modify it manually.kubectl get cephcluster -n rook-ceph -o yaml
For details about the Ceph cluster status and to get access to CLI tools, connect to the
pelagia-ceph-toolbox
pod as described in the following step. -
-
Verify the
pelagia-ceph-toolbox
pod:- Execute the
pelagia-ceph-toolbox
pod:kubectl -n rook-ceph exec -it deploy/pelagia-ceph-toolbox -- bash
- Verify that CLI commands can run on the
pelagia-ceph-toolbox
pod:ceph -s
- Execute the
-
Verify hardware:
- Through the
pelagia-ceph-toolbox
pod, obtain the required device in your cluster:ceph osd tree
- Enter all Ceph OSD pods in the
rook-ceph
namespace one by one:kubectl exec -it -n rook-ceph <osd-pod-name> bash
- Verify that the
ceph-volume
tool is available on all pods running on the target node:ceph-volume lvm list
- Through the
-
Verify data access. Ceph volumes can be consumed directly by Kubernetes workloads and internally, for example, by OpenStack services. To verify the Kubernetes storage:
-
Verify the available storage classes. The storage classes that are automatically managed by Ceph Controller use the
rook-ceph.rbd.csi.ceph.com
provisioner.kubectl get storageclass
Example of system response:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE kubernetes-ssd (default) rook-ceph.rbd.csi.ceph.com Delete Immediate false 55m
-
Verify that volumes are properly connected to the Pod:
-
Obtain the list of volumes in all namespaces or use a particular one:
kubectl get persistentvolumeclaims -A
Example of system response:
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rook-ceph app-test Bound pv-test 1Gi RWO kubernetes-ssd 11m
-
For each volume, verify the connection. For example:
kubectl describe pvc app-test -n rook-ceph
Example of a positive system response:
Name: app-test Namespace: kaas StorageClass: rook-ceph Status: Bound Volume: pv-test Labels: <none> Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 1Gi Access Modes: RWO VolumeMode: Filesystem Events: <none>
In case of connection issues, inspect the Pod description for the volume information:
kubectl describe pod <crashloopbackoff-pod-name>
Example of system response:
... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 3 default-scheduler Warning FailedScheduling PersistentVolumeClaim is not bound: "app-test" (repeated 2 times) 1h 35s 36 kubelet, 172.17.8.101 Warning FailedMount Unable to mount volumes for pod "wordpress-mysql-918363043-50pjr_default(08d14e75-bd99-11e7-bc4c-001c428b9fc8)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-918363043-50pjr". list of unattached/unmounted volumes=[mysql-persistent-storage] 1h 35s 36 kubelet, 172.17.8.101 Warning FailedSync Error syncing pod
-
-
Verify that the CSI provisioner plugins started properly and are in the
Running
status:- Obtain the list of CSI provisioner plugins:
kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
- Verify the logs of the required CSI provisioner:
kubectl logs -n rook-ceph <csi-provisioner-plugin-name> csi-provisioner
- Obtain the list of CSI provisioner plugins:
-