Creating a Ceph OSD remove task#
The workflow of creating a Ceph OSD removal task includes the following steps:
-
Removing obsolete nodes or disks from the
spec.nodessection of theCephDeploymentcustom resource (CR) as described in Architecture: CephDeployment nodes parameters.Note
Note the names of the removed nodes, devices or their paths exactly as they were specified in
CephDeploymentfor further usage. -
Creating a YAML template for the
CephOsdRemoveTaskCR. For details, see Architecture: CephOsdRemoveTask.- If
CephOsdRemoveTaskcontains information about Ceph OSDs to remove in a proper format, the information will be validated to eliminate human error and avoid a wrong Ceph OSD removal. - If the
nodessection ofCephOsdRemoveTaskis empty, the Pelagia LCM Controller will automatically detect Ceph OSDs for removal, if any. Auto-detection is based not only on the information provided in the RookCephClusterCR but also on the information from the Ceph cluster itself.
Once the validation or auto-detection completes, the entire information about the Ceph OSDs to remove appears in the
CephOsdRemoveTaskobject: hosts they belong to, OSD IDs, disks, partitions, and so on. The request then moves to theApproveWaitingphase until the cloud operator manually specifies theapproveflag in the spec. - If
-
Manually adding an affirmative
approveflag in theCephOsdRemoveTaskspec. Once done, Pelagia Controllers and Rook Ceph Operator reconciliation pause until the task is handled and execute the following:- Stops regular Rook Ceph Operator orchestration. Also, Pelagia Deployment Controller pauses its reconcile.
- Removes Ceph OSDs.
- Runs batch jobs to clean up the device, if possible.
- Removes host information from the Ceph cluster if the entire Ceph node is removed.
- Marks the task with an appropriate result with a description of occurred issues.
Note
If the task completes successfully, Rook Ceph Operator and Pelagia Deployment Controller reconciliation resumes. Otherwise, it remains paused until the issue is resolved.
-
Reviewing the Ceph OSD removal status. For details, see Architecture: CephOsdRemoveTask status.
-
Manual removal of device cleanup jobs.
Note
Device cleanup jobs are not removed automatically and are kept in Pelagia namespace along with pods containing information about the executed actions. The jobs have the following labels:
labels: app: pelagia-lcm-cleanup-disks host: <HOST-NAME> osd: <OSD-ID> rook-cluster: <ROOK-CLUSTER-NAME>Additionally, jobs are labeled with disk names that will be cleaned up, such as
sdb=true. You can remove a single job or a group of jobs using any label described above, such as host, disk, and so on.
Example of CephOsdRemoveTask resource#
apiVersion: lcm.mirantis.com/v1alpha1
kind: CephOsdRemoveTask
metadata:
name: remove-osd-3-4-task
namespace: pelagia
spec:
nodes:
worker-3:
cleanupByDevice:
- device: sdb
- device: /dev/disk/by-path/pci-0000:00:1t.9
CephOsdRemoveTask to find all ready to remove Ceph OSDs#
apiVersion: lcm.mirantis.com/v1alpha1
kind: CephOsdRemoveTask
metadata:
generateName: remove-osds
namespace: pelagia
spec:
nodes: {}
SEE ALSO#
CephOsdRemoveRequest failure with a timeout during rebalance