Add, remove, or reconfigure Ceph nodes#
Pelagia Lifecycle Management (LCM) Controller simplifies Ceph cluster management by automating LCM operations. This section describes how to add, remove, or reconfigure Ceph nodes.
Note
When adding a Ceph node with the Ceph Monitor role, if any issues occur with
the Ceph Monitor, rook-ceph
removes it and adds a new Ceph Monitor instead,
named using the next alphabetic character in order. Therefore, the Ceph Monitor
names may not follow the alphabetical order. For example, a
, b
, d
,
instead of a
, b
, c
.
Add a Ceph node #
- Prepare a new node for the cluster.
-
Open the
CephDeployment
custom resource (CR) for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section, specify the parameters for a Ceph node as required. For the parameter description, see CephDeployment: Nodes parameters.The example configuration of the
nodes
section with the new node:nodes: - name: storage-worker-414 roles: - mon - mgr devices: - config: deviceClass: hdd fullPath: /dev/disk/by-id/scsi-SATA_HGST_HUS724040AL_PN1334PEHN18ZS
You can also add a new node with device filters. For example:
nodes: - name: storage-worker-414 roles: - mon - mgr config: deviceClass: hdd devicePathFilter: "^/dev/disk/by-id/scsi-SATA_HGST+*"
Warning
We highly recommend using the non-wwn
by-id
symlinks to specify storage devices in thedevices
list. For details, see Architecture: Addressing Ceph devices.Note
- To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment,
also specify the
roles
parameter. - Reducing the number of Ceph Monitors is not supported and causes the Ceph Monitor daemons removal from random nodes.
- Removal of the
mgr
role in thenodes
section of theCephDeployment
CR does not remove Ceph Managers. To remove a Ceph Manager from a node, remove it from thenodes
spec and manually delete themgr
pod in the Rook namespace.
- To use a new Ceph node for a Ceph Monitor or Ceph Manager deployment,
also specify the
-
Verify that all new Ceph daemons for the specified node have been successfully deployed in the Ceph cluster. The
CephDeploymentHealth
CRstatus.healthReport.cephDaemons.cephDaemons
should not contain any issues.kubectl -n pelagia get cephdeploymenthealth -o yaml
Example of system response:
status: healthReport: cephDaemons: cephDaemons: mgr: info: - 'a is active mgr, standbys: [b]' status: ok mon: info: - 3 mons, quorum [a b c] status: ok osd: info: - 3 osds, 3 up, 3 in status: ok
Remove a Ceph node #
Note
Ceph node removal presupposes usage of a CephOsdRemoveTask
CR. For workflow overview, see
High-level workflow of Ceph OSD or node removal.
Note
To remove a Ceph node with a mon
role, first move the Ceph
Monitor to another node and remove the mon
role from the Ceph node as
described in
Move a Ceph Monitor daemon to another node.
-
Open the
CephDeployment
CR for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section, remove the required Ceph node specification.For example:
spec: nodes: - name: storage-worker-5 # remove the entire entry for the required node devices: {...} roles: [...]
-
Create a YAML template for the
CephOsdRemoveTask
CR. For example:apiVersion: lcm.mirantis.com/v1alpha1 kind: CephOsdRemoveTask metadata: name: remove-osd-worker-5 namespace: pelagia spec: nodes: storage-worker-5: completeCleanUp: true
-
Apply the template on the Rockoon cluster:
kubectl apply -f remove-osd-worker-5.yaml
-
Verify that the corresponding request has been created:
kubectl -n pelagia get cephosdremovetask remove-osd-worker-5
-
Verify that the
removeInfo
section appeared in theCephOsdRemoveTask
CRstatus
:kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -o yaml
Example of system response:
status: removeInfo: cleanupMap: storage-worker-5: osdMapping: "10": deviceMapping: sdb: path: "/dev/disk/by-path/pci-0000:00:1t.9" partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdb" type: "block" class: "hdd" zapDisk: true "16": deviceMapping: sdc: path: "/dev/disk/by-path/pci-0000:00:1t.10" partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdc" type: "block" class: "hdd" zapDisk: true
-
Verify that the
cleanupMap
section matches the required removal and wait for theApproveWaiting
phase to appear instatus
:kubectl -n pelagia get cephosdremovetask remove-osd-worker-5 -o yaml
Example of system response:
status: phase: ApproveWaiting
-
Edit the
CephOsdRemoveTask
CR and set theapprove
flag totrue
:kubectl -n pelagia edit cephosdremovetask remove-osd-worker-5
For example:
spec: approve: true
-
Review the status of the
CephOsdRemoveTask
resource processing. The valuable parameters are as follows:status.phase
- the current state of task processingstatus.messages
- the description of the current phasestatus.conditions
- full history of task processing before the current phasestatus.removeInfo.issues
andstatus.removeInfo.warnings
- contain error and warning messages occurred during task processing
-
Verify that the
CephOsdRemoveTask
has been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues
-
Remove the device cleanup jobs:
kubectl delete jobs -n pelagia -l app=pelagia-lcm-cleanup-disks
Reconfigure a Ceph node#
There is no hot reconfiguration procedure for existing Ceph OSDs and Ceph Monitors. To reconfigure an existing Ceph node, follow the steps below:
- Remove the Ceph node from the Ceph cluster as described in Remove a Ceph node.
- Add the same Ceph node but with a modified configuration as described in Add a Ceph node.