Move a Ceph Monitor daemon to another node#
This document describes how to migrate a Ceph Monitor daemon from one node to another without changing the general number of Ceph Monitors in the cluster. In the Pelagia Controllers concept, migration of a Ceph Monitor means manually removing it from one node and adding it to another.
Consider the following exemplary placement scheme of Ceph Monitors in the
nodes spec of the CephDeployment custom resource (CR):
spec:
nodes:
node-1:
roles:
- mon
- mgr
node-2:
roles:
- mgr
Using the example above, if you want to move the Ceph Monitor from node-1
to node-2 without changing the number of Ceph Monitors, the roles table
of the nodes spec must result as follows:
spec:
nodes:
node-1:
roles:
- mgr
node-2:
roles:
- mgr
- mon
However, due to the Rook limitation related to Kubernetes architecture, once
you move the Ceph Monitor through the CephDeployment CR, changes will not
apply automatically. This is caused by the following Rook behavior:
- Rook creates Ceph Monitor resources as deployments with
nodeSelector, which binds Ceph Monitor pods to a requested node. - Rook does not recreate new Ceph Monitors with the new node placement if the
current
monquorum works.
Therefore, to move a Ceph Monitor to another node, you must also manually apply the new Ceph Monitors placement to the Ceph cluster as described below.
Move a Ceph Monitor to another node#
-
Open the
CephDeploymentCR for editing:kubectl -n pelagia edit cephdpl -
In the
nodesspec of theCephDeploymentCR, change themonroles placement without changing the total number ofmonroles. For details, see the example above. Note the nodes on which themonroles have been removed and save thenamevalue of those nodes. -
Obtain the
rook-ceph-mondeployment name placed on the obsolete node using the previously obtained node name:kubectl -n rook-ceph get deploy -l app=rook-ceph-mon -o jsonpath="{.items[?(@.spec.template.spec.nodeSelector['kubernetes\.io/hostname'] == '<nodeName>')].metadata.name}"Substitute
<nodeName>with the name of the node where you removed themonrole. -
Back up the
rook-ceph-mondeployment placed on the obsolete node:kubectl -n rook-ceph get deploy <rook-ceph-mon-name> -o yaml > <rook-ceph-mon-name>-backup.yaml -
Remove the
rook-ceph-mondeployment placed on the obsolete node:kubectl -n rook-ceph delete deploy <rook-ceph-mon-name> -
Wait approximately 10 minutes until
rook-ceph-operatorperforms a failover of thePendingmonpod. Inspect the logs during the failover process:kubectl -n rook-ceph logs -l app=rook-ceph-operator -fExample of log extract:
2021-03-15 17:48:23.471978 W | op-mon: mon "a" not found in quorum, waiting for timeout (554 seconds left) before failover -
If the failover process fails:
- Scale down the
rook-ceph-operatordeployment to0replicas. - Apply the backed-up
rook-ceph-mondeployment. - Scale back the
rook-ceph-operatordeployment to1replica.
- Scale down the
Once done, Rook removes the obsolete Ceph Monitor from the node and creates
a new one on the specified node with a new letter. For example, if the a,
b, and c Ceph Monitors were in quorum and mon-c was obsolete, Rook
removes mon-c and creates mon-d. In this case, the new quorum includes
the a, b, and d Ceph Monitors.