Replace a failed Ceph OSD#
After a physical disk replacement, you can use Pelagia Lifecycle Management (LCM) API to redeploy a failed Ceph OSD. The common flow of replacing a failed Ceph OSD is as follows:
- Remove the obsolete Ceph OSD from the Ceph cluster by device symlink, by device name, or by Ceph OSD ID.
- Add a new Ceph OSD on the new disk to the Ceph cluster.
Note
Ceph OSD removal presupposes usage of a CephOsdRemoveTask
CR. For workflow overview, see
High-level workflow of Ceph OSD or node removal.
Remove a failed Ceph OSD by device name, path, or ID #
Warning
The procedure below presupposes that the cloud operator knows the exact
device name, by-path
, or by-id
of the replaced device, as well as on
which node the replacement occurred.
Warning
A Ceph OSD removal using by-path
, by-id
, or device name is
not supported if a device was physically removed from a node. Therefore, use
cleanupByOsdId
instead. For details, see
Remove a failed Ceph OSD by Ceph OSD ID.
Warning
We do not recommend setting device name or device by-path
symlink in the cleanupByDevice
field
as these identifiers are not persistent and can change at node boot. Remove Ceph OSDs with by-id
symlinks or use cleanupByOsdId
instead. For details, see
Architecture: Addressing Ceph devices.
-
Open the
CephDeployment
CR for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section, remove the required device fromdevices
. When using device filters, update thedeviceFilter
ordevicePathFilter
regexp accordingly.For example:
spec: nodes: - name: <nodeName> devices: - name: <deviceName> # remove the entire item from devices list # fullPath: <deviceByPath> if device is specified with symlink instead of name config: deviceClass: hdd
Substitute
<nodeName>
with the node name where the device<deviceName>
or<deviceByPath>
is going to be replaced. -
Save
CephDeployment
changes and close the editor. -
Create a
CephOsdRemoveTask
CR template and save it asreplace-failed-osd-<nodeName>-<deviceName>-task.yaml
:apiVersion: lcm.mirantis.com/v1alpha1 kind: CephOsdRemoveTask metadata: name: replace-failed-osd-<nodeName>-<deviceName> namespace: pelagia spec: nodes: <nodeName>: cleanupByDevice: - name: <deviceName> # If a device is specified with by-path or by-id instead of # name, path: <deviceByPath> or <deviceById>.
-
Apply the template to the cluster:
kubectl apply -f replace-failed-osd-<nodeName>-<deviceName>-task.yaml
-
Verify that the corresponding request has been created:
kubectl -n pelagia get cephosdremovetask
-
Verify that the
removeInfo
section appeared in theCephOsdRemoveTask
CRstatus
:kubectl -n pelagia get cephosdremovetask replace-failed-osd-<nodeName>-<deviceName> -o yaml
Example of system response:
status: osdRemoveStatus: removeInfo: cleanupMap: <nodeName>: osdMapping: <osdId>: deviceMapping: <dataDevice>: deviceClass: hdd devicePath: <dataDeviceByPath> devicePurpose: block usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2 zapDisk: true
Definition of values in angle brackets:
<nodeName>
- underlying node name of the machine, for example,storage-worker-52
;<osdId>
- Ceph OSD ID for the device being replaced, for example,1
;<dataDeviceByPath>
-by-path
of the device placed on the node, for example,/dev/disk/by-path/pci-0000:00:1t.9
;<dataDevice>
- name of the device placed on the node, for example,/dev/sdb
.
-
Verify that the
cleanupMap
section matches the required removal and wait for theApproveWaiting
phase to appear instatus
:kubectl -n pelagia get cephosdremovetask replace-failed-osd-<nodeName>-<deviceName> -o yaml
Example of system response:
status: phase: ApproveWaiting
-
Edit the
CephOsdRemoveTask
CR and set theapprove
flag totrue
:kubectl -n pelagia edit cephosdremovetask replace-failed-osd-<nodeName>-<deviceName>
For example:
spec: approve: true
-
Review the following
status
fields of the Ceph LCM CR processing:status.phase
- current state of request processing;status.messages
- description of the current phase;status.conditions
- full history of request processing before the current phase;status.removeInfo.issues
andstatus.removeInfo.warnings
- error and warning messages occurred during request processing, if any.
-
Verify that the
CephOsdRemoveTask
has been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues
-
Remove the device cleanup jobs:
kubectl delete jobs -n pelagia -l app=pelagia-lcm-cleanup-disks
Remove a failed Ceph OSD by Ceph OSD ID #
-
Identify the node and device names used by the affected Ceph OSD. Using the Ceph CLI in the
pelagia-ceph-toolbox
Pod, run:kubectl -n rook-ceph exec -it deploy/pelagia-ceph-toolbox -- ceph osd metadata <osdId>
Substitute
<osdId>
with the affected OSD ID.Example output:
{ "id": 1, ... "bluefs_db_devices": "vdc", ... "bluestore_bdev_devices": "vde", ... "devices": "vdc,vde", ... "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf", ... },
In the example above,
hostname
is the node name anddevices
are all devices used by the affected Ceph OSD. -
Open the
CephDeployment
CR for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section, remove the required device:spec: nodes: - name: <nodeName> devices: - name: <deviceName> # remove the entire item from devices list config: deviceClass: hdd
Substitute
<nodeName>
with the node name where the device<deviceName>
is going to be replaced. -
Save
CephDeployment
changes and close the editor. -
Create a
CephOsdRemoveTask
CR template and save it asreplace-failed-<nodeName>-osd-<osdId>-task.yaml
:apiVersion: lcm.mirantis.com/v1alpha1 kind: CephOsdRemoveTask metadata: name: replace-failed-<nodeName>-osd-<osdId> namespace: pelagia spec: nodes: <nodeName>: cleanupByOsdId: - id: <osdId>
-
Apply the template to the cluster:
kubectl apply -f replace-failed-<nodeName>-osd-<osdId>-task.yaml
-
Verify that the corresponding request has been created:
kubectl -n pelagia get cephosdremovetask
-
Verify that the
removeInfo
section appeared in theCephOsdRemoveTask
CRstatus
:kubectl -n pelagia get cephosdremovetask replace-failed-<nodeName>-osd-<osdId>-task -o yaml
Example of system response
status: osdRemoveStatus: removeInfo: cleanupMap: <nodeName>: osdMapping: <osdId>: deviceMapping: <dataDevice>: deviceClass: hdd devicePath: <dataDeviceByPath> devicePurpose: block usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2 zapDisk: true
Definition of values in angle brackets:
<nodeName>
- underlying node name of the machine, for example,storage-worker-52
;<osdId>
- Ceph OSD ID for the device being replaced, for example,1
;<dataDeviceByPath>
-by-path
of the device placed on the node, for example,/dev/disk/by-path/pci-0000:00:1t.9
;<dataDevice>
- name of the device placed on the node, for example,/dev/sdb
.
-
Verify that the
cleanupMap
section matches the required removal and wait for theApproveWaiting
phase to appear instatus
:kubectl -n pelagia get cephosdremovetask replace-failed-<nodeName>-osd-<osdId>-task -o yaml
Example of system response:
status: phase: ApproveWaiting
-
Edit the
CephOsdRemoveTask
CR and set theapprove
flag totrue
:kubectl -n pelagia edit cephosdremovetask replace-failed-<nodeName>-osd-<osdId>-request
For example:
spec: approve: true
-
Review the following
status
fields of the Ceph LCM CR processing:status.phase
- current state of request processing;status.messages
- description of the current phase;status.conditions
- full history of request processing before the current phase;status.removeInfo.issues
andstatus.removeInfo.warnings
- error and warning messages occurred during request processing, if any.
-
Verify that the
CephOsdRemoveTask
has been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues
-
Remove the device cleanup jobs:
kubectl delete jobs -n pelagia -l app=pelagia-lcm-cleanup-disks
Deploy a new device after removal of a failed one #
Note
You can spawn Ceph OSD on a raw device, but it must be clean and without any data or partitions. If you want to add a device that was in use, also ensure it is raw and clean. To clean up all data and partitions from a device, refer to official Rook documentation.
-
Manually prepare the replacement device on the existing node.
-
Optional. If you want to add a Ceph OSD on top of a raw device that already exists on a node or is hot-plugged, add the required device using the following guidelines:
- You can add a raw device to a node during node deployment.
- If a node supports adding devices without a node reboot, you can hot plug a raw device to a node.
- If a node does not support adding devices without a node reboot, you can hot plug a raw device during node shutdown.
-
Open the
CephDeployment
CR for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section, add a new device:spec: nodes: - name: <nodeName> devices: - fullPath: <deviceByID> # Recommended. Non-wwn by-id symlink. # name: <deviceByID> # Not recommended. If a device is supposed to be added with by-id. # fullPath: <deviceByPath> # Not recommended. If a device is supposed to be added with by-path. config: deviceClass: hdd
Substitute
<nodeName>
with the node name where device<deviceName>
or<deviceByPath>
is going to be added as a Ceph OSD. -
Verify that the Ceph OSD on the specified node is successfully deployed. The
CephDeploymentHealth
CRstatus.healthReport.cephDaemons.cephDaemons
section should not contain any issues.kubectl -n pelagia get cephdeploymenthealth -o yaml
For example:
status: healthReport: cephDaemons: cephDaemons: osd: info: - 3 osds, 3 up, 3 in status: ok
-
Verify the desired Ceph OSD pod is
Running
:kubectl -n rook-ceph get pod -l app=rook-ceph-osd -o wide | grep <nodeName>