Replace a failed Ceph OSD with a metadata device as a logical volume path#
You can apply the below procedure in the following cases:
- A Ceph OSD failed without a data or metadata device outage. In this case, first remove a failed Ceph OSD and clean up all corresponding disks and partitions. Then add a new Ceph OSD to the same data and metadata paths.
- A Ceph OSD failed with data or metadata device outage. In this case, you also first remove a failed Ceph OSD and clean up all corresponding disks and partitions. Then add a new Ceph OSD to a newly replaced data device with the same metadata path.
Note
The below procedure also applies to manually created metadata partitions.
Remove a failed Ceph OSD by ID with a defined metadata device #
-
Identify the ID of Ceph OSD related to a failed device. For example, use the Ceph CLI in the
pelagia-ceph-toolbox
Pod:ceph osd metadata
Example of system response:
{ "id": 0, ... "bluestore_bdev_devices": "vdc", ... "devices": "vdc", ... "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf", ... "pod_name": "rook-ceph-osd-0-7b8d4d58db-f6czn", ... }, { "id": 1, ... "bluefs_db_devices": "vdf", ... "bluestore_bdev_devices": "vde", ... "devices": "vde,vdf", ... "hostname": "kaas-node-6c5e76f9-c2d2-4b1a-b047-3c299913a4bf", ... "pod_name": "rook-ceph-osd-1-78fbc47dc5-px9n2", ... }, ...
-
Open the
CephDeployment
custom resource (CR) for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section:- Find and capture the
metadataDevice
path to reuse it during re-creation of the Ceph OSD. -
Remove the required device. Example configuration snippet:
spec: nodes: - name: <nodeName> devices: - name: <deviceName> # remove the entire item from the devices list # fullPath: <deviceByPath> if device is specified using by-path instead of name config: deviceClass: hdd metadataDevice: /dev/bluedb/meta_1
In the example above,
<nodeName>
is the name of node on which the device<deviceName>
or<deviceByPath>
must be replaced.
- Find and capture the
-
Create a
CephOsdRemoveTask
CR template and save it asreplace-failed-osd-<nodeName>-<osdID>-task.yaml
:apiVersion: lcm.mirantis.com/v1alpha1 kind: CephOsdRemoveTask metadata: name: replace-failed-osd-<nodeName>-<deviceName> namespace: pelagia spec: nodes: <nodeName>: cleanupByOsdId: - id: <osdID>
Substitute the following parameters: -
<nodeName>
and<deviceName>
with the node and device names from the previous step; -<osdID>
with the ID of the affected Ceph OSD. -
Apply the template to the cluster:
kubectl apply -f replace-failed-osd-<nodeName>-<osdID>-task.yaml
-
Verify that the corresponding task has been created:
kubectl -n pelagia get cephosdremovetask
-
Verify that the
status
section ofCephOsdRemoveTask
contains theremoveInfo
section:kubectl -n pelagia get cephosdremovetask replace-failed-osd-<nodeName>-<osdID> -o yaml
Example of system response:
removeInfo: cleanupMap: <nodeName>: osdMapping: "<osdID>": deviceMapping: <dataDevice>: deviceClass: hdd devicePath: <dataDeviceByPath> devicePurpose: block usedPartition: /dev/ceph-d2d3a759-2c22-4304-b890-a2d87e056bd4/osd-block-ef516477-d2da-492f-8169-a3ebfc3417e2 zapDisk: true <metadataDevice>: deviceClass: hdd devicePath: <metadataDeviceByPath> devicePurpose: db usedPartition: /dev/bluedb/meta_1 uuid: ef516477-d2da-492f-8169-a3ebfc3417e2
Definition of values in angle brackets:
<nodeName>
- underlying node name of the machine, for example,storage-worker-3
<osdId>
- Ceph OSD ID for the device being replaced, for example,1
<dataDeviceByPath>
-by-path
of the device placed on the node, for example,/dev/disk/by-path/pci-0000:00:1t.9
<dataDevice>
- name of the device placed on the node, for example,/dev/vde
<metadataDevice>
- metadata name of the device placed on the node, for example,/dev/vdf
<metadataDeviceByPath>
- metadataby-path
of the device placed on the node, for example,/dev/disk/by-path/pci-0000:00:12.0
-
Verify that the
cleanupMap
section matches the required removal and wait for theApproveWaiting
phase to appear instatus
:kubectl -n pelagia get cephosdremovetask replace-failed-osd-<nodeName>-<osdID> -o yaml
Example of system response:
status: phase: ApproveWaiting
-
In the
CephOsdRemoveTask
CR, set theapprove
flag totrue
:kubectl -n pelagia edit cephosdremovetask replace-failed-osd-<nodeName>-<osdID>
Configuration snippet:
spec: approve: true
-
Review the following
status
fields of the Ceph LCM CR processing:status.phase
- current state of task processing;status.messages
- description of the current phase;status.conditions
- full history of task processing before the current phase;status.removeInfo.issues
andstatus.removeInfo.warnings
- error and warning messages occurred during task processing, if any.
-
Verify that the
CephOsdRemoveTask
has been completed. For example:status: phase: Completed # or CompletedWithWarnings if there are non-critical issues
Re-create a Ceph OSD with the same metadata partition#
Note
You can spawn Ceph OSD on a raw device, but it must be clean and without any data or partitions. If you want to add a device that was in use, also ensure it is raw and clean. To clean up all data and partitions from a device, refer to official Rook documentation.
-
Optional. If you want to add a Ceph OSD on top of a raw device that already exists on a node or is hot-plugged, add the required device using the following guidelines:
- You can add a raw device to a node during node deployment.
- If a node supports adding devices without a node reboot, you can hot plug a raw device to a node.
- If a node does not support adding devices without a node reboot, you can hot plug a raw device during node shutdown.
-
Open the
CephDeployment
CR for editing:kubectl -n pelagia edit cephdpl
-
In the
nodes
section, add the replaced device with the samemetadataDevice
path as on the removed Ceph OSD. For example:spec: nodes: - name: <nodeName> devices: - fullPath: <deviceByID> # Recommended. Add a new device by-id symlink, for example, /dev/disk/by-id/... #name: <deviceByID> # Not recommended. Add a new device by ID, for example, /dev/disk/by-id/... #fullPath: <deviceByPath> # Not recommended. Add a new device by path, for example, /dev/disk/by-path/... config: deviceClass: hdd metadataDevice: /dev/bluedb/meta_1 # Must match the value of the previously removed OSD
Substitute
<nodeName>
with the node name where the new device<deviceByID>
or<deviceByPath>
must be added. -
Wait for the replaced disk to apply to the Ceph cluster as a new Ceph OSD. You can monitor the application state using either the
status
section of theCephDeploymentHealth
CR or in thepelagia-ceph-toolbox
Pod:kubectl -n pelagia get cephdeploymenthealth -o yaml kubectl -n rook-ceph exec -it deploy/pelagia-ceph-toolbox -- ceph -s