Ceph health reports PG_DAMAGED after a failed disk or node replacement#
After adding a new OSD node on a compact cluster, Ceph health may report
HEALTH_ERR with the ceph health detail command output containing
PG_DAMAGED and OSD_SCRUB_ERRORS messages. For example:
$ ceph -s
cluster:
id: 8bca9dfb-df99-4920-bba0-e5bca59876b4
health: HEALTH_ERR
1 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum a,b,c (age 3h)
mgr: a(active, since 3h), standbys: b
osd: 4 osds: 4 up (since 109m), 4 in (since 110m)
rgw: 2 daemons active (2 hosts, 1 zones)
$ ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 11.2a is active+clean+inconsistent, acting [3,1]
To fix the PG_DAMAGED health error:
-
Obtain the damaged placement group (PG) ID:
ceph health detailExample of system response:
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 11.2a is active+clean+inconsistent, acting [3,1]In the example above,
11.2ais the required PG ID. -
Repair the damaged PG:
ceph pg repair <pgid>Substitute
<pgid>with a damaged PG ID. For example:ceph pg repair 11.2a