In my kubernetes cluster (v1.14.7) after a cluster update one node didn't recoverd correctly. the rook osd from that node didn't get rescheduled ( as it's explained in the documentation) So im trying to add manually a new OSD.
my ceph status
return this
and my ceph osd tree
return this
I tried link the new osd with the node using ceph osd crush set osd.0 0.29199 root=default host=gke-dev-dev-110dd9ec-ntww
but it return: Error ENOENT: unable to set item id 0 name 'osd.0' weight 0.29199 at location {host=gke-dev-dev-110dd9ec-ntww,root=default}: does not exist
Do you got a clue on how to fix this ? Thanks in advance
For the rook user: https://rook.io/docs/rook/master/ceph-osd-mgmt.html
A blog to have an explanation(中文读者):https://zhuanlan.zhihu.com/p/140486398
Here's what I suggest, instead of trying to add a new osd right away, fix/remove the defective one and it should re-create.
Try this:
1 - mark out osd: ceph osd out osd.0
2 - remove from crush map: ceph osd crush remove osd.0
3 - delete caps: ceph auth del osd.0
4 - remove osd: ceph osd rm osd.0
5 - delete the deployment: kubectl delete deployment -n your-cluster-namespace rook-ceph-osd-0
6 - edit out the config section of your osd id and underlying device.
kubectl edit configmap -n your-cluster-namespace rook-ceph-osd-nodename-config
delete {"/var/lib/rook":x}
7 - restart the rook-operator pod by deleting the rook-operator pod
8 - verify the health of your cluster: ceph -s; ceph osd tree
Hope this helps!