Zalenium: 504 Gateway Time-out in OpenShift environment

3/18/2019

My Zalenium installation on an OpenShift environment is far from stable. The web ui (admin view with vnc, dashboard, selenium console) works about 50% of the time and connecting with a RemoteWebDriver doesn't work at all.

Error:
504 Gateway Time-out The server didn't respond in time.

WebDriver error:

org.openqa.selenium.WebDriverException: Unable to parse remote response: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>


    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:115)

oc version:
oc v3.9.0+191fece
kubernetes v1.9.1+a0ce1bc657

Zalenium template:

apiVersion: v1
kind: Template
metadata:
  name: zalenium
  annotations:
    "openshift.io/display-name": "Zalenium"
    "description": "Disposable Selenium Grid for use in OpenShift"
message: |-
  A Zalenium grid has been created in your project. Continue to overview to verify that it exists and start the deployment.
parameters:
- name: PROJECTNAME
  description: The namespace / project name of this project
  displayName: Namespace
  required: true
- name: HOSTNAME
  description: hostname used for route creation
  displayName: route hostname
  required: true
- name: "VOLUME_CAPACITY"
  displayName: "Volume capacity for the disk that contains the test results."
  description: "The volume is used to store all the test results, including logs and video recordings of the tests."
  value: "10Gi"
  required: true
objects:
- apiVersion: v1
  kind: DeploymentConfig
  metadata:
    generation: 1
    labels:
      app: zalenium
      role: hub
    name: zalenium
  spec:
    replicas: 1
    selector:
      app: zalenium
      role: hub
    strategy:
      activeDeadlineSeconds: 21600
      resources: {}
      type: Rolling
    template:
      metadata:
        labels:
          app: zalenium
          role: hub
      spec:
        containers:
        - args:
          - start
          - --seleniumImageName
          - "elgalu/selenium:latest"
          - --sendAnonymousUsageInfo
          - "false"
          image: dosel/zalenium:latest
          imagePullPolicy: Always
          name: zalenium
          ports:
          - containerPort: 4444
            protocol: TCP
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /home/seluser/videos
            name: zalenium-volume
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        securityContext: {}
        serviceAccount: deployer
        serviceAccountName: deployer
        volumes:
        - name: zalenium-volume
          persistentVolumeClaim:
            claimName: zalenium-pvc
    test: false
    triggers:
    - type: ConfigChange
- apiVersion: v1
  kind: Route
  metadata:
    labels:
      app: zalenium
    annotations:
      openshift.io/host.generated: 'true'
      haproxy.router.openshift.io/timeout: "60"
    name: zalenium
  spec:
    host: zalenium-4444-${PROJECTNAME}.${HOSTNAME}
    to:
      kind: Service
      name: zalenium
    port:
      targetPort: selenium-4444
- apiVersion: v1
  kind: Route
  metadata:
    labels:
      app: zalenium
    annotations:
      openshift.io/host.generated: 'true'
      haproxy.router.openshift.io/timeout: "60"
    name: zalenium-4445
  spec:
    host: zalenium-4445-${PROJECTNAME}.${HOSTNAME}
    to:
      kind: Service
      name: zalenium
    port:
      targetPort: selenium-4445
- apiVersion: v1
  kind: Service
  metadata:
    labels:
      app: zalenium
    name: zalenium
  spec:
    ports:
    - name: selenium-4444
      port: 4444
      protocol: TCP
      targetPort: 4444
    - name: selenium-4445
      port: 4445
      protocol: TCP
      targetPort: 4445
    selector:
      app: zalenium
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    labels:
      app: zalenium
    name: zalenium-pvc
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: ${VOLUME_CAPACITY}

Errors in main pod:
I get about 2-3 errors in 30 minutes.

[OkHttp https://172.17.0.1/ ...] ERROR i.f.k.c.d.i.ExecWebSocketListener - Exec Failure: HTTP:403. Message:pods "zalenium-40000-wvpjb" is forbidden: User "system:serviceaccount:PROJECT:deployer" cannot get pods/exec in the namespace "PROJECT": User "system:serviceaccount:PROJECT:deployer" cannot get pods/exec in project "PROJECT"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
[OkHttp https://172.17.0.1/ ...] ERROR d.z.e.z.c.k.KubernetesContainerClient - zalenium-40000-wvpjb Failed to execute command [bash, -c, notify 'Zalenium', 'TEST COMPLETED', --icon=/home/seluser/images/completed.png]
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

With own service account: yml template of the sa:

- apiVersion: v1
  kind: Role
  metadata:
    name: zalenium-role
    labels:
      app: zalenium
  rules:
   - apiGroups:
     - ""
     attributeRestrictions: null
     resources:
     - pods
     verbs:
     - create
     - delete
     - deletecollection
     - get
     - list
     - watch
   - apiGroups:
     - ""
     attributeRestrictions: null
     resources:
     - pods/exec
     verbs:
     - create
     - delete
     - list
     - get
   - apiGroups:
     - ""
     attributeRestrictions: null
     resources:
     - services
     verbs:
     - create
     - delete
     - get
     - list
- apiVersion: v1
  kind: ServiceAccount
  metadata:
    labels:
      app: zalenium
    name: zalenium-sa
- apiVersion: v1
  kind: RoleBinding
  metadata:
    labels:
      app: zalenium
    name: zalenium-rolebinding
  roleRef:
    kind: Role
    name: zalenium-role
    namespace: ${PROJECTNAME}
  subjects:
  - kind: ServiceAccount
    name: zalenium-sa
    namespace: ${PROJECTNAME}
  userNames:
  - zalenium-sa

Result:

--WARN 10:22:28:182931026 We don't have sudo
Kubernetes service account found.
Copying files for Dashboard...
Starting Nginx reverse proxy...
Starting Selenium Hub...
.....10:22:29.626 [main] INFO  o.o.grid.selenium.GridLauncherV3 - Selenium server version: 3.141.59, revision: unknown
.10:22:29.771 [main] INFO  o.o.grid.selenium.GridLauncherV3 - Launching Selenium Grid hub on port 4445
..10:22:30.292 [main] INFO  d.z.e.z.c.k.KubernetesContainerClient - Initialising Kubernetes support
..10:22:30.700 [main] WARN  d.z.e.z.c.k.KubernetesContainerClient - Error initialising Kubernetes support.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://172.30.0.1/api/v1/namespaces/PROJECT/pods/zalenium-1-j6s4q . Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "zalenium-1-j6s4q" is forbidden: User "system:serviceaccount:PROJECT:zalenium-sa" cannot get pods in the namespace "PROJECT": User "system:serviceaccount:PROJECT:zalenium-sa" cannot get pods in project "PROJECT".
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:476)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:413)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:794)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:210)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:177)
    at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.<init>(KubernetesContainerClient.java:91)
    at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient(ContainerFactory.java:43)
    at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient(ContainerFactory.java:22)
    at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.<clinit>(DockeredSeleniumStarter.java:63)
    at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:97)
    at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:83)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at org.openqa.grid.web.Hub.<init>(Hub.java:94)
    at org.openqa.grid.selenium.GridLauncherV3.lambda$buildLaunchers$5(GridLauncherV3.java:264)
    at org.openqa.grid.selenium.GridLauncherV3.lambda$launch$0(GridLauncherV3.java:86)
    at java.util.Optional.map(Optional.java:215)
    at org.openqa.grid.selenium.GridLauncherV3.launch(GridLauncherV3.java:86)
    at org.openqa.grid.selenium.GridLauncherV3.main(GridLauncherV3.java:70)
10:22:30.701 [main] INFO  d.z.e.z.c.k.KubernetesContainerClient - About to clean up any left over docker-selenium pods created by Zalenium
Exception in thread "main" org.openqa.grid.common.exception.GridConfigurationException: Error creating class with de.zalando.ep.zalenium.registry.ZaleniumRegistry : null
    at org.openqa.grid.web.Hub.<init>(Hub.java:99)
    at org.openqa.grid.selenium.GridLauncherV3.lambda$buildLaunchers$5(GridLauncherV3.java:264)
    at org.openqa.grid.selenium.GridLauncherV3.lambda$launch$0(GridLauncherV3.java:86)
    at java.util.Optional.map(Optional.java:215)
    at org.openqa.grid.selenium.GridLauncherV3.launch(GridLauncherV3.java:86)
    at org.openqa.grid.selenium.GridLauncherV3.main(GridLauncherV3.java:70)
Caused by: java.lang.ExceptionInInitializerError
    at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:97)
    at de.zalando.ep.zalenium.registry.ZaleniumRegistry.<init>(ZaleniumRegistry.java:83)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at org.openqa.grid.web.Hub.<init>(Hub.java:94)
    ... 5 more
Caused by: java.lang.NullPointerException
    at java.util.TreeMap.putAll(TreeMap.java:313)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels(BaseOperation.java:426)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.withLabels(BaseOperation.java:63)
    at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.deleteSeleniumPods(KubernetesContainerClient.java:402)
    at de.zalando.ep.zalenium.container.kubernetes.KubernetesContainerClient.initialiseContainerEnvironment(KubernetesContainerClient.java:348)
    at de.zalando.ep.zalenium.container.ContainerFactory.createKubernetesContainerClient(ContainerFactory.java:46)
    at de.zalando.ep.zalenium.container.ContainerFactory.getContainerClient(ContainerFactory.java:22)
    at de.zalando.ep.zalenium.proxy.DockeredSeleniumStarter.<clinit>(DockeredSeleniumStarter.java:63)
    ... 13 more
-- ner0
kubernetes
openshift
selenium
selenium-grid
zalenium

1 Answer

3/19/2019
[OkHttp https://172.17.0.1/ ...] ERROR i.f.k.c.d.i.ExecWebSocketListener - Exec Failure: HTTP:403. Message:pods "zalenium-40000-wvpjb" is forbidden: User "system:serviceaccount:PROJECT:deployer" cannot get pods/exec in the namespace "PROJECT": User "system:serviceaccount:PROJECT:deployer" cannot get pods/exec in project "PROJECT"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

Usually this means that the service account does not have enough rights, perhaps start by checking that.

-- Diego M.
Source: StackOverflow