CockroachDB on Single Cluster Kube PODs fail with CrashLoopBackOff

1/13/2019

Using VirtualBox and 4 x Centos7 OS installs.

Following a basic Single cluster kubernetes install:

https://kubernetes.io/docs/setup/independent/install-kubeadm/ https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

[root@k8s-master cockroach]# kubectl get nodes
NAME         STATUS   ROLES    AGE   VERSION
k8s-master   Ready    master   41m   v1.13.2
k8s-slave1   Ready    <none>   39m   v1.13.2
k8s-slave2   Ready    <none>   39m   v1.13.2
k8s-slave3   Ready    <none>   39m   v1.13.2

I have created 3 x NFS PV's on master for my slaves to pick up as part of the cockroachdb-statefulset.yaml as described here:

https://www.cockroachlabs.com/blog/running-cockroachdb-on-kubernetes/

However my cockroach PODs just continually fail to communicate with each other.

    [root@k8s-slave1 kubernetes]# kubectl get pods
NAME            READY   STATUS             RESTARTS   AGE
cockroachdb-0   0/1     CrashLoopBackOff   6          8m47s
cockroachdb-1   0/1     CrashLoopBackOff   6          8m47s
cockroachdb-2   0/1     CrashLoopBackOff   6          8m47s

[root@k8s-slave1 kubernetes]# kubectl get pvc
NAME                    STATUS   VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-cockroachdb-0   Bound    cockroachdbpv0   10Gi       RWO                           17m
datadir-cockroachdb-1   Bound    cockroachdbpv2   10Gi       RWO                           17m
datadir-cockroachdb-2   Bound    cockroachdbpv1   10Gi       RWO                           17m

...the cockroach pod logs do not really tell me why...

    [root@k8s-slave1 kubernetes]# kubectl logs cockroachdb-0
++ hostname -f
+ exec /cockroach/cockroach start --logtostderr --insecure --advertise-host cockroachdb-0.cockroachdb.default.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%
W190113 17:00:46.589470 1 cli/start.go:1055  RUNNING IN INSECURE MODE!

- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.

Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v2.1/secure-a-cluster.html
I190113 17:00:46.595544 1 server/status/recorder.go:609  available memory from cgroups (8.0 EiB) exceeds system memory 3.7 GiB, using system memory
I190113 17:00:46.600386 1 cli/start.go:1069  CockroachDB CCL v2.1.3 (x86_64-unknown-linux-gnu, built 2018/12/17 19:15:31, go1.10.3)
I190113 17:00:46.759727 1 server/status/recorder.go:609  available memory from cgroups (8.0 EiB) exceeds system memory 3.7 GiB, using system memory
I190113 17:00:46.759809 1 server/config.go:386  system total memory: 3.7 GiB
I190113 17:00:46.759872 1 server/config.go:388  server configuration:
max offset             500000000
cache size             947 MiB
SQL memory pool size   947 MiB
scan interval          10m0s
scan min idle time     10ms
scan max idle time     1s
event log enabled      true
I190113 17:00:46.759896 1 cli/start.go:913  using local environment variables: COCKROACH_CHANNEL=kubernetes-insecure
I190113 17:00:46.759909 1 cli/start.go:920  process identity: uid 0 euid 0 gid 0 egid 0
I190113 17:00:46.759919 1 cli/start.go:545  starting cockroach node
I190113 17:00:46.762262 22 storage/engine/rocksdb.go:574  opening rocksdb instance at "/cockroach/cockroach-data/cockroach-temp632709623"
I190113 17:00:46.803749 22 server/server.go:851  [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190113 17:00:46.804168 22 storage/engine/rocksdb.go:574  opening rocksdb instance at "/cockroach/cockroach-data"
I190113 17:00:46.828487 22 server/config.go:494  [n?] 1 storage engine initialized
I190113 17:00:46.828526 22 server/config.go:497  [n?] RocksDB cache size: 947 MiB
I190113 17:00:46.828536 22 server/config.go:497  [n?] store 0: RocksDB, max size 0 B, max open file limit 60536
W190113 17:00:46.838175 22 gossip/gossip.go:1499  [n?] no incoming or outgoing connections
I190113 17:00:46.838260 22 cli/start.go:505  initial startup completed, will now wait for `cockroach init`
or a join to a running cluster to start accepting clients.
Check the log file(s) for progress.
I190113 17:00:46.841243 22 server/server.go:1402  [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W190113 17:01:16.841095 89 cli/start.go:535  The server appears to be unable to contact the other nodes in the cluster. Please try:

- starting the other nodes, if you haven't already;
- double-checking that the '--join' and '--listen'/'--advertise' flags are set up correctly;
- running the 'cockroach init' command if you are trying to initialize a new cluster.

If problems persist, please see https://www.cockroachlabs.com/docs/v2.1/cluster-setup-troubleshooting.html.
I190113 17:01:31.357765 1 cli/start.go:756  received signal 'terminated'
I190113 17:01:31.359529 1 cli/start.go:821  initiating graceful shutdown of server
initiating graceful shutdown of server
I190113 17:01:31.361064 1 cli/start.go:872  too early to drain; used hard shutdown instead
too early to drain; used hard shutdown instead

...any ideas how to debug this further?

-- Gareth
cockroachdb
kubernetes

3 Answers

1/26/2019

OK it came down to the fact I had NAT as my virtualbox external facing network adaptor. I changed it to Bridged and it all started working perfectly. If anyone can tell me why, that would be awesome :)

-- Gareth
Source: StackOverflow

1/13/2019

I have gone through *.yaml file at https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset.yaml I noticed that towards the bottom there is no storageClassName mentioned which means that during the volume claim process, pods are going to look for standard storage class. I am not sure if you used below annotation while provisioning 3 NFS volumes -

storageclass.kubernetes.io/is-default-class=true

You should be able to check the same using -

kubectl get storageclass

If the output does not show Standard storage class then I would suggest either readjusting persistent volumes definitions by adding annotation or add empty string as storageClassName towards the end of the cockroach-statefulset.yaml file

More logs can be viewed using -

kubectl describe cockroachdb-{statefulset}
-- Mukesh Sharma
Source: StackOverflow

7/28/2019

In my case, using helm chart, like below:

$ helm install stable/cockroachdb \
  -n cockroachdb \
  --namespace cockroach \
  --set Storage=10Gi \
  --set NetworkPolicy.Enabled=true \
  --set Secure.Enabled=true

After wait to finish adding csr's for cockroach:

$ watch kubectl get csr

Several csr's are pending:

$ kubectl get csr
NAME                                         AGE    REQUESTOR                                                   CONDITION
cockroachdb.client.root                      130m   system:serviceaccount:cockroachdb:cockroachdb-cockroachdb   Pending
cockroachdb.node.cockroachdb-cockroachdb-0   130m   system:serviceaccount:cockroachdb:cockroachdb-cockroachdb   Pending
cockroachdb.node.cockroachdb-cockroachdb-1   129m   system:serviceaccount:cockroachdb:cockroachdb-cockroachdb   Pending
cockroachdb.node.cockroachdb-cockroachdb-2   130m   system:serviceaccount:cockroachdb:cockroachdb-cockroachdb   Pending

To approve that run follow command:

$ kubectl get csr -o json | \
  jq -r '.items[] | select(.metadata.name | contains("cockroach.")) | .metadata.name' | \
  xargs -n 1 kubectl certificate approve
-- Bruno Wego
Source: StackOverflow