Connection to Splash service on Kubernetes, GKE

3/8/2018

I have a Python controller which uses scrapy-splash lib that sends SplashRequest to a Splash service.

Locally, I run both, the controller and the splash service in a two different Dockers.

yield SplashRequest(url=response.url, callback=parse, splash_url=<URL> endpoint='execute', args=<SPLASH_ARGS>)

When I send the request locally with splash_url="http://127.0.0.1:8050, everything works fine.

Now, I wanted to have a Kubernetes deployment with Splash and to process the splash request on the cloud. I have created Splash Deployment and a Service with type=LoadBalancer on Google Cloud Kubernetes.

And sending the splash request to the External Ip of the splash service.

But splash doesn't receive any request... and in the python script I get

twisted.python.failure.Failure twisted.internet.error.TCPTimedOutError: TCP connection timed out: 60: Operation timed out.

It worked in the past while using Internal endpoint of the pod, but I started to get Missing schema exception cause I didn't used http:// in the url.

  • splash docker image scrapinghub/splash:3.2
  • Kubernetes version 1.7, (tried also on 1.9)

splash-deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: my-app
  name: splash
  namespace: ns-app
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      labels:
        app: splash
    spec:
      containers:
      - image: scrapinghub/splash:3.2
        name: splash
        ports:
        - containerPort: 8050
        resources: {}
      restartPolicy: Always
status: {}

splash-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: app
  name: splash
  namespace: ns-app
spec:
  type: LoadBalancer
  ports:
  - name: "8050"
    port: 8050
    targetPort: 8050
    protocol: TCP
  selector:
    app: app
status:
  loadBalancer: {}

UPDATE I noticed that locally when I get into http://localhost:8050/ I see Splash UI, while entering to the via Kubernetes IP I get

refused to connect

How to solve it?? Thank you

-- Ami Hollander
docker
kubernetes
scrapy-splash
splash-screen

2 Answers

3/9/2018

The problem is that splash-service.yaml selector is wrong.. it should point to the Deployment name.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: app
  name: splash
  namespace: ns-app
spec:
  type: LoadBalancer
  ports:
  - name: "8050"
    port: 8050
    targetPort: 8050
    protocol: TCP
  selector:
    app: splash
status:
  loadBalancer: {}
-- Ami Hollander
Source: StackOverflow

3/9/2018

UPDATE I noticed now that you found alone the issue, my bad.

I believe that as Ami Hollander is right, it is an issue with the label selector, but I would like to explain you why.

Consider that each time you create a service with a selector, an endpoint resource is created as well, it is populated with all the address of the nodes having a pod matching the label, you can add as well manually any IP or Domain to point to external resources.

Kubernetes services can be exposed on externalIPs that routes to one or more cluster nodes. Traffic that ingresses into the cluster with the external IP (as destination IP), on the service port, will be routed to one of the service endpoints.

Therefore, as they pointed you out, your selector was not matching any pod and the endpoint resource likely does not contain any backend and so any way route the request. You can double check it running:

$ kubectl get endpoints
$ Kubectl describe endpoints endpointname

It can be misleading because on the other hand if you run

$ kubectl get services

you will notice that the service has been correctly created showing a private and a public IP that will be simply a dead end.

  • You were able to see it correctly because everything was working, but the request was not routed in the right way.
-- GalloCedrone
Source: StackOverflow