Skip to content

Creating a Ceph OSD remove task#

The workflow of creating a Ceph OSD removal task includes the following steps:

  1. Removing obsolete nodes or disks from the spec.nodes section of the CephDeployment custom resource (CR) as described in Architecture: CephDeployment nodes parameters.

    Note

    Note the names of the removed nodes, devices or their paths exactly as they were specified in CephDeployment for further usage.

  2. Creating a YAML template for the CephOsdRemoveTask CR. For details, see Architecture: CephOsdRemoveTask.

    • If CephOsdRemoveTask contains information about Ceph OSDs to remove in a proper format, the information will be validated to eliminate human error and avoid a wrong Ceph OSD removal.
    • If the nodes section of CephOsdRemoveTask is empty, the Pelagia LCM Controller will automatically detect Ceph OSDs for removal, if any. Auto-detection is based not only on the information provided in the Rook CephCluster CR but also on the information from the Ceph cluster itself.

    Once the validation or auto-detection completes, the entire information about the Ceph OSDs to remove appears in the CephOsdRemoveTask object: hosts they belong to, OSD IDs, disks, partitions, and so on. The request then moves to the ApproveWaiting phase until the cloud operator manually specifies the approve flag in the spec.

  3. Manually adding an affirmative approve flag in the CephOsdRemoveTask spec. Once done, Pelagia Controllers and Rook Ceph Operator reconciliation pause until the task is handled and execute the following:

    • Stops regular Rook Ceph Operator orchestration. Also, Pelagia Deployment Controller pauses its reconcile.
    • Removes Ceph OSDs.
    • Runs batch jobs to clean up the device, if possible.
    • Removes host information from the Ceph cluster if the entire Ceph node is removed.
    • Marks the task with an appropriate result with a description of occurred issues.

    Note

    If the task completes successfully, Rook Ceph Operator and Pelagia Deployment Controller reconciliation resumes. Otherwise, it remains paused until the issue is resolved.

  4. Reviewing the Ceph OSD removal status. For details, see Architecture: CephOsdRemoveTask status.

  5. Manual removal of device cleanup jobs.

    Note

    Device cleanup jobs are not removed automatically and are kept in Pelagia namespace along with pods containing information about the executed actions. The jobs have the following labels:

    labels:
      app: pelagia-lcm-cleanup-disks
      host: <HOST-NAME>
      osd: <OSD-ID>
      rook-cluster: <ROOK-CLUSTER-NAME>
    

    Additionally, jobs are labeled with disk names that will be cleaned up, such as sdb=true. You can remove a single job or a group of jobs using any label described above, such as host, disk, and so on.

Example of CephOsdRemoveTask resource#

apiVersion: lcm.mirantis.com/v1alpha1
kind: CephOsdRemoveTask
metadata:
  name: remove-osd-3-4-task
  namespace: pelagia
spec:
  nodes:
    worker-3:
      cleanupByDevice:
      - device: sdb
      - device: /dev/disk/by-path/pci-0000:00:1t.9

CephOsdRemoveTask to find all ready to remove Ceph OSDs#

apiVersion: lcm.mirantis.com/v1alpha1
kind: CephOsdRemoveTask
metadata:
  generateName: remove-osds
  namespace: pelagia
spec:
  nodes: {}

SEE ALSO#

CephOsdRemoveRequest failure with a timeout during rebalance