calico-node pods don't start after gke cluster upgrade from 1.10.x to 1.11.x

12/4/2018

We have upgraded our GKE cluster to 1.11.x and although the process finished successfully the cluster is not working. There are multiple pods that crash or stay peding and it all points at calico network don`t working:

calico-node-2hhfz       1/2       CrashLoopBackOff    5          6m

Its log shows this info:

kubectl -n kube-system logs -f calico-node-2hhfz calico-node

Notice the errors (could not find the requested resource (post BGPConfigurations.crd.projectcalico.org))at the end:

2018-12-04 11:22:39.617 [INFO][10] startup.go 252: Early log level set to info
2018-12-04 11:22:39.618 [INFO][10] startup.go 268: Using NODENAME environment for node name
2018-12-04 11:22:39.618 [INFO][10] startup.go 280: Determined node name: gke-apps-internas-apps-internas-4c-6r-ecf8b140-9p8x
2018-12-04 11:22:39.619 [INFO][10] startup.go 303: Checking datastore connection
2018-12-04 11:22:39.626 [INFO][10] startup.go 327: Datastore connection verified
2018-12-04 11:22:39.626 [INFO][10] startup.go 100: Datastore is ready
2018-12-04 11:22:39.632 [INFO][10] startup.go 1052: Running migration
2018-12-04 11:22:39.632 [INFO][10] migrate.go 866: Querying current v1 snapshot and converting to v3
2018-12-04 11:22:39.632 [INFO][10] migrate.go 875: handling FelixConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling ClusterInformation (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: skipping FelixConfiguration (per-node) resources - not supported
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling BGPConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 600: Converting BGP config -> BGPConfiguration(default)
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping Node resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping BGPPeer (global) resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: handling BGPPeer (node) resources
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping HostEndpoint resources - not supported
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping IPPool resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping GlobalNetworkPolicy resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping Profile resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: skipping WorkloadEndpoint resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: data converted successfully
2018-12-04 11:22:39.652 [INFO][10] migrate.go 866: Storing v3 data
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: Storing resources in v3 format
2018-12-04 11:22:39.673 [INFO][10] migrate.go 1151: Failed to create resource Key=BGPConfiguration(default) error=resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] migrate.go 884: Unable to store the v3 resources
2018-12-04 11:22:39.673 [INFO][10] migrate.go 875: cause: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] startup.go 107: Unable to ensure datastore is migrated. error=Migration failed: error storing converted data: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [WARNING][10] startup.go 1066: Terminating
Calico node failed to start

Any idea how we can fix the cluster?

-- codependent
google-kubernetes-engine
kubernetes
project-calico

1 Answer

12/4/2018

There was a problem with the GKE upgrade process that leaves calico pods unable to start due to the lack of a custom resource definition for BGPConfiguration.

After applying the corresponding crd to the cluster problem solved:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: bgpconfigurations.crd.projectcalico.org
spec:
  scope: Cluster
  group: crd.projectcalico.org
  version: v1
  names:
    kind: BGPConfiguration
    plural: bgpconfigurations
    singular: bgpconfiguration 
-- codependent
Source: StackOverflow