Maintenance stuck on a compact Ceph cluster#

Warning

When using the non-recommended Ceph pools replicated.size of less than 3, Ceph OSD removal cannot be performed. The minimum replica size equals a rounded up half of the specified replicated.size.

For example, if replicated.size is 2, the minimal replica size is 1, and if replicated.size is 3, then the minimal replica size is 2. The replica size of 1 allows Ceph having PGs with only one Ceph OSD in the acting state, which may cause a PG_TOO_DEGRADED health warning that blocks Ceph OSD removal. Mirantis recommends setting replicated.size to 3 for each Ceph pool.

When disabling or removing a Ceph node during upgrade or maintenance operations such as rolling reboot, Ceph may not complete rebalancing if only two of three OSD nodes remain active. The CephDeployment object can remain in Maintenance, causing the rebalance process to wait indefinitely for Ceph to become ready.

The issue may only affect environments with a small number of Ceph OSD nodes (for example, three), pool replica count set to one less than the number of storage nodes (replicas=storage_nodes_count-1), and failure domain host.

To apply the issue resolution, run the following command for the affected Ceph OSD node:

ceph osd reweight <osdId> 0