Add, remove, or reconfigure Ceph nodes#
Pelagia Lifecycle Management (LCM) Controller simplifies Ceph cluster management by automating LCM operations. This section describes how to add, remove, or reconfigure Ceph nodes.
Note
When adding a Ceph node with the Ceph Monitor role, if any issues occur with
the Ceph Monitor, rook-ceph removes it and adds a new Ceph Monitor instead,
named using the next alphabetic character in order. Therefore, the Ceph Monitor
names may not follow the alphabetical order. For example, a, b, d,
instead of a, b, c.
Add a Ceph node #
- Prepare a new node for the cluster.
-
Open the
CephDeploymentcustom resource (CR) for editing:kubectl -n pelagia edit cephdpl -
In the
nodessection, specify the parameters for a Ceph node as required. For the parameter description, see CephDeployment: Nodes parameters.The example configuration of the
nodessection with the new node:nodes: - name: storage-worker-414 roles: - mon - mgr devices: - config: deviceClass: hdd fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZSYou can also add a new node with device filters. For example:
nodes: - name: storage-worker-414 roles: - mon - mgr config: deviceClass: hdd devicePathFilter: "^/dev/disk/by-id/scsi-SATA_HGST+*"Warning
We highly recommend using the non-wwn
by-idsymlinks to specify storage devices in thedeviceslist. For details, see Architecture: Addressing Ceph devices.Note
- To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment,
also specify the
rolesparameter. - Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.
- Removal of the
mgrrole in thenodessection of theCephDeploymentCR does not remove Ceph Managers. To remove a Ceph Manager from a node, remove it from thenodesspec and manually delete themgrpod in the Rook namespace.
- To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment,
also specify the
-
Verify that all new Ceph daemons for the specified node have been successfully deployed in the Ceph cluster. The
CephDeploymentHealthCRstatus.healthReport.cephDaemons.cephDaemonsshould not contain any issues.kubectl -n pelagia get cephdeploymenthealth -o yamlExample of system response:
status: healthReport: cephDaemons: cephDaemons: mgr: info: - 'a is active mgr, standbys: [b]' status: ok mon: info: - 3 mons, quorum [a b c] status: ok osd: info: - 3 osds, 3 up, 3 in status: ok
Remove a Ceph node #
Note
Ceph node removal presupposes usage of a CephOsdRemoveTask CR. For workflow overview, see
High-level workflow of Ceph OSD or node removal.
Note
To remove a Ceph node with a mon role, first move the Ceph
Monitor to another node and remove the mon role from the Ceph node as
described in
Move a Ceph Monitor daemon to another node.
-
Open the
CephDeploymentCR for editing:kubectl -n pelagia edit cephdpl -
In the
nodessection, remove the required Ceph node specification.For example:
spec: nodes: - name: storage-worker-5 # remove the entire entry for the required node devices: {...} roles: [...] -
Create a YAML template for the
CephOsdRemoveTaskCR. For example:apiVersion: lcm.mirantis.com/v1alpha1 kind: CephOsdRemoveTask metadata: name: remove-osd-worker-5 namespace: pelagia spec: nodes: storage-worker-5: completeCleanUp: true -
Apply the template on the Rockoon cluster:
kubectl apply -f remove-osd-worker-5.yaml -
Verify that the corresponding request has been created:
kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -
Verify that the
removeInfosection appeared in theCephOsdRemoveTaskCRstatus:kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -o yamlExample of system response:
status: removeInfo: cleanupMap: storage-worker-5: osdMapping: "10": deviceMapping: sdb: path: "/dev/disk/by-path/pci-0000:00:1t.9" partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdb" type: "block" class: "hdd" zapDisk: true "16": deviceMapping: sdc: path: "/dev/disk/by-path/pci-0000:00:1t.10" partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdc" type: "block" class: "hdd" zapDisk: true -
Verify that the
cleanupMapsection matches the required removal and wait for theApproveWaitingphase to appear instatus:kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -o yamlExample of system response:
status: phase: ApproveWaiting -
Edit the
CephOsdRemoveTaskCR and set theapproveflag totrue:kubectl -n pelagia edit cephosdremovetask remove-osd-worker-5For example:
spec: approve: true -
Review the status of the
CephOsdRemoveTaskresource processing. The valuable parameters are as follows:status.phase- the current state of task processingstatus.messages- the description of the current phasestatus.conditions- full history of task processing before the current phasestatus.removeInfo.issuesandstatus.removeInfo.warnings- contain error and warning messages occurred during task processing
-
Verify that the
CephOsdRemoveTaskhas been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues -
Remove the device cleanup jobs:
kubectl delete jobs -n pelagia -l app=pelagia-lcm-cleanup-disks
Reconfigure a Ceph node#
There is no hot reconfiguration procedure for existing Ceph OSDs and Ceph Monitors. To reconfigure an existing Ceph node, follow the steps below:
- Remove the Ceph node from the Ceph cluster as described in Remove a Ceph node.
- Add the same Ceph node but with a modified configuration as described in Add a Ceph node.