Eclipse che - volume mount error while launching workspace by 5 users at same time

3/24/2021

Configuration:

  • Google Kubernete Engine (GKE) version - 1.18.12-gke.1210
  • Nodes count - 2
  • Node Configuration - 2 core 8 GB memory Machine with 30 GB hardisk
  • AutoScale is enabaled

Eclipse che Advanced configuration:

server:
  CHE_WORKSPACE_POOL_EXACT__SIZE: "60"
  CHE_WORKSPACE_STORAGE_PREFERRED__TYPE: ephemeral
  allowUserDefinedWorkspaceNamespaces: false
  cheDebug: "false"
  cheFlavor: che
  cheHost: che-eclipse-che.domain.com
  cheLogLevel: INFO
  cheServerIngress: {}
  cheServerRoute: {}
  devfileRegistryIngress: {}
  devfileRegistryRoute: {}
  externalDevfileRegistry: false
  externalPluginRegistry: false
  gitSelfSignedCert: false
  pluginRegistryIngress: {}
  pluginRegistryRoute: {}
  selfSignedCert: true
  tlsSupport: true
  useInternalClusterSVCNames: true
  workspaceNamespaceDefault: all-che-workspace
storage:
  preCreateSubPaths: true
  pvcClaimSize: 128Gi
  pvcStrategy: common
  preferred_type: persistent

Bug Description :

Logged in 10 different users at the same time and launched 10 workspaces of each users at a time. 3 - 5 users are able to launch the workspace successfully, for remaining users getting time out error in mount volume, some users keep on loading the workspace and nothing is initialised in log window.

Error Screenshots :

  • Error for User 1:

Failed to run the workspace: "Unrecoverable event occurred: 'FailedMount', 'Unable to attach or mount volumes: unmounted volumes=claim-che-workspace, unattached volumes=gitconfigvolume remote-endpoint che-workspace-token-dmn68 workspacep72ony0ucs0pqa5c-sshprivatekeys che-ca-certs broker-config-volume5iwa24 ssshkeyconfigvolume claim-che-workspace che-jwtproxy-config-volume: timed out waiting for the condition', 'workspacep72ony0ucs0pqa5c.maven-d5476444f-6tcgg'"

  • Error for User 2:

    Failed to run the workspace: "Waiting for Kubernetes environment 'default' of the workspace'workspaceo6i4zoqzs1xym88w' reached timeout"

-- Ashhar Azeez
docker
eclipse-che
google-kubernetes-engine
kubernetes

2 Answers

3/25/2021

Try to run Che Workspace in debug mode. See https://www.eclipse.org/che/docs/che-7/end-user-guide/investigating-failures-at-a-workspace-start-using-the-verbose-mode/ It might give you some inputs about what is going wrong.

-- Sergii Kabashniuk
Source: StackOverflow

3/29/2021

The most probable cause of the issues in the question (which I've managed to reproduce) could be related to the storage configuration used.

Citing the documentation:

When the common PVC strategy is in use, user-defined PVCs are ignored and volumes that refer to these user-defined PVCs are replaced with a volume that refers to the common PVC. In this strategy, all Che workspaces use the same PVC. When the user runs one workspace, it only binds to one node in the cluster at a time.

-- Eclipse.org: Che: Docs: Configuring storage strategies: The common PVC strategy

By default when you create a PVC with GKE you are in fact creating a Persistent Disk which can be mounted in RWO access mode to a single node. If the workspace is scheduled onto a node that the PVC is not mounted to, the creation process will fail and you will get following message:

Unable to attach or mount volumes: unmounted volumes=[claim-che-workspace], unattached volumes=[che-ca-certs che-workspace-token-5tb9b broker-config-volumeavmw4x claim-che-workspace workspacebgnsca7mkryv3s3m-sshprivatekeys ssshkeyconfigvolume gitconfigvolume]: timed out waiting for the condition
Failed to run the workspace: "Plugins installation process failed. Error: Unrecoverable event occurred: 'FailedMount', 'Unable to attach or mount volumes: unmounted volumes=[claim-che-workspace], unattached volumes=[che-ca-certs che-workspace-token-5tb9b broker-config-volumeavmw4x claim-che-workspace workspacebgnsca7mkryv3s3m-sshprivatekeys ssshkeyconfigvolume gitconfigvolume]: timed out waiting for the condition', 'workspacebgnsca7mkryv3s3m.che-plugin-broker'"

I'd reckon to avoid this issue you could either (not tested):

  • Use RWX storage solution.
  • Change the storage configuration for unique.
  • Use a single node. <-- not great idea but it should work

To have more insight on the issue, you can monitor the state of Pods when creating workspace. It could tell you more about the process:

  • $ kubectl get pods -n kruk-che:
NAMESPACE       NAME                                                        READY   STATUS              RESTARTS   AGE    IP            NODE                 NOMINATED NODE   READINESS GATES
kruk-che        workspace4pfd7goxamv8vvs4.maven-5d9c59746f-swfdd            5/5     Running             0          29m    10.4.0.58     gke-name-pool-9530   <none>           <none>
kruk-che        workspacebgnsca7mkryv3s3m.che-plugin-broker                 0/1     ContainerCreating   0          105s   <none>        gke-name-pool-qhns   <none>           <none>
kruk-che        workspacezo0l50kaa2zvm4lv.maven-8644fbf959-45xjd            5/5     Running             0          12m    10.4.0.62     gke-name-pool-9530   <none>           <none>

In above example you can see Pods in the ContainerCreating state. You can inspect them for more information about their state like:

$ kubectl describe pod -n kruk-che workspacebgnsca7mkryv3s3m.che-plugin-broker

Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           83s   default-scheduler        Successfully assigned kruk-che/workspacebgnsca7mkryv3s3m.che-plugin-broker to gke-name-pool-qhns
  Warning  FailedAttachVolume  83s   attachdetach-controller  Multi-Attach error for volume "pvc-869bc565-4dd1-4362-8d63-f3b1fde6f246" Volume is already used by pod(s) workspacezo0l50kaa2zvm4lv.maven-8644fbf959-45xjd, workspace4pfd7goxamv8vvs4.maven-5d9c59746f-swfdd
  Warning  FailedMount         10s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[claim-che-workspace], unattached volumes=[che-ca-certs che-workspace-token-5tb9b broker-config-volumeavmw4x claim-che-workspace workspacebgnsca7mkryv3s3m-sshprivatekeys ssshkeyconfigvolume gitconfigvolume]: timed out waiting for the condition

Additional resources:

-- Dawid Kruk
Source: StackOverflow