频道栏目
首页 > 网络 > 云计算 > 正文

ceph多mon挂掉后的快速恢复

2017-08-10 09:51:54      个评论    来源:kong62‘s Blog  
收藏   我要投稿
# ceph -s
cluster 0fbf2746-8132-4944-af64-e29e24e871bb
health HEALTH_WARN
1 mons down, quorum 1,2 ceph01,ceph03
monmap e3: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=172.28.13.60:6789/0}
election epoch 616, quorum 1,2 ceph01,ceph03
…….

上面看出ceph02挂掉了.

移除该mon
# ceph mon remove ceph02

清理该mon的文件
# rm -rf /var/lib/ceph/mon/ceph-ceph02

为ceph02 mon 初始化数据库 store.db
# ceph-mon –mkfs -i ceph02 –keyring /etc/ceph/ceph.mon.keyring

补充done空文件
# touch /var/lib/ceph/mon/ceph-ceph02/done

添加ceph02 mon的keyring
# ceph auth get-or-create mon.ceph02 mon ‘allow rwx’ osd ‘allow *’ -o /var/lib/ceph/mon/ceph-ceph02/keyring

补充服务相关空文件:
如果是sysvinit来管理的用这个:
# touch /var/lib/ceph/mon/ceph-ceph02/sysvinit
如果是systemctl来管理的用这个:
# touch /var/lib/ceph/mon/ceph-ceph02/systemd

修改文件权限
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-ceph02

添加mon到集群
# ceph mon add ceph02 172.28.13.59:6789

重启服务
# systemctl restart ceph-mon@ceph02.service

报错了:
Apr 26 17:14:48 ceph02 systemd[1]: ceph-mon@ceph02.service failed.
Apr 26 17:14:48 ceph02 polkitd[741]: Unregistered Authentication Agent for unix-process:29956:253026270 (system bus name :1.10026, object path /org/freedesktop/PolicyKit1/AuthenticationAg
Apr 26 17:14:54 ceph02 polkitd[741]: Registered Authentication Agent for unix-process:29988:253026886 (system bus name :1.10027 [/usr/bin/pkttyagent –notify-fd 5 –fallback], object path
Apr 26 17:14:54 ceph02 systemd[1]: start request repeated too quickly for ceph-mon@ceph02.service
Apr 26 17:14:54 ceph02 systemd[1]: Failed to start Ceph cluster monitor daemon.
— Subject: Unit ceph-mon@ceph02.service has failed

reload下,再次重启正常
# systemctl daemon-reload
# systemctl restart ceph-mon@ceph02.service

# ceph -s
cluster 0fbf2746-8132-4944-af64-e29e24e871bb
health HEALTH_OK
monmap e7: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=172.28.13.60:6789/0}
election epoch 626, quorum 0,1,2 ceph01,ceph02,ceph03
fsmap e98: 1/1/1 up {0=ceph01=up:active}
osdmap e280: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v41214: 594 pgs, 16 pools, 2197 MB data, 854 objects
7669 MB used, 592 GB / 599 GB avail
594 active+clean

验证下是否有文件丢失
# ceph osd pool ls
# rados -p rbd ls
上一篇:ceph-deploy:Couldnotfindkeyringfile:/var/lib/ceph/mon/ceph-ceph01/keyring
下一篇:白话CTC(connectionisttemporalclassification)算法讲解
相关文章
图文推荐

关于我们 | 联系我们 | 广告服务 | 投资合作 | 版权申明 | 在线帮助 | 网站地图 | 作品发布 | Vip技术培训 | 举报中心

版权所有: 红黑联盟--致力于做实用的IT技术学习网站