I'm deploying rook-ceph into a minikube cluster. Everything seems to be working. I added 3 unformatted disk to the vm and its connected. The problem that im having is when I run ceph status, I get a health warm message that tells me "1 pg undersized". How exactly do I fix this?
The documentation(https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/) stated "If you are trying to create a cluster on a single node, you must change the default of the osd crush chooseleaf type setting from 1 (meaning host or node) to 0 (meaning osd) in your Ceph configuration file before you create your monitors and OSDs." I don't know where to make this configuration but if there's any other way to fix this that I should know of, please let me know. Thanks!
As you mentioned in your question you should change your crush failure-domain-type to OSD that it means it will replicate your data between OSDs not hosts. By default it is host and when you have only one host it doesn't have any other hosts to replicate your data and so your pg will always be undersized.
You should set osd crush chooseleaf type = 0
in your ceph.conf
before you create your monitors and OSDs.
This will replicate your data between OSDs rather that hosts.
I came across this problem installing ceph using rook (v1.5.7) with a single data bearing host having multiple OSDs.
The install shipped with a default CRUSH rule replicated_rule
which had host
as the default failure domain:
$ ceph osd crush rule dump replicated_rule
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
I had to find out the pool name associated with pg 1 that was "undersized", luckily in a default rook-ceph install, there's only one:
$ ceph osd pool ls
device_health_metrics
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+undersized+remapped
And to confirm the pg is using the default rule:
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule
Instead of modifying the default CRUSH rule, I opted to create a new replicated rule, but this time specifying the osd
(aka device
) type (docs: CRUSH map Types and Buckets), also assuming the default CRUSH root of default
:
# osd crush rule create-replicated <name> <root> <type> [<class>]
$ ceph osd crush rule create-replicated replicated_rule_osd default osd
$ ceph osd crush rule dump replicated_rule_osd
{
"rule_id": 1,
"rule_name": "replicated_rule_osd",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
And then assigning the new rule to the existing pool:
$ ceph osd pool set device_health_metrics crush_rule replicated_rule_osd
set pool 1 crush_rule to replicated_rule_osd
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule_osd
Finally confirming pg state:
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+clean
New account so can't add as comment, wanted to expound on @zamnuts answer as I hit the same in my cluster with rook:v1.7.2, if wanting to change the default device_health_metrics in the Rook/Ceph Helm chart or in the YAML, the following document is relevant
https://github.com/rook/rook/blob/master/deploy/examples/pool-device-health-metrics.yaml
https://github.com/rook/rook/blob/master/Documentation/helm-ceph-cluster.md